The shared data is a SQL dump of citeseerx database with 3
tables: citations, citationContexts, and papers.
- Important fields of table papers:
- (1) id: each pdf will have a different id, this id
is referred to as paperid in table citations;
- (2) cluster: same paper (may be have more pdfs in
our databases) will have a unique cluster number.
- Important fields of table citations:
- (1) id: this id is referred to as citationid
in table citationContexts;
- (2) cluster: the cluster number of the cited
- (3) paperid: the id of citing document.
- Important fields of table citationContexts:
- (1) citationid: link to the citations table.
- (2) context: citation contexts, citations are surrounded by =-= and -=-.