Reputation: 499
I have a dataset with paper and author nodes. The relationships represent citations (paper to paper) and authorship (author to paper).
For all authors, I would like to calculate the number of papers they have written and the number of citations they received, in order to calculate the average number of citations per paper.
However, the paper nodes have a year attribute that I would like to filter on, so as to find the average number of citations per paper for an author in a given year.
That is to say, for an author, find the papers written before a certain date, find the number of papers citing these papers written before a certain date, and return the former divided by the latter as an average.
The code I have so far is:
MATCH (a:Author)-[:AUTHORED]->(q:Paper) WHERE q.year <= 2008
WITH a, count(q) as papers_written
MATCH (p:Paper)-[:CITES]->(q) WHERE p.year <= 2008
WITH count(p) as citations, a, papers_written
RETURN a.name, citations, papers_written
For some reason this drastically overcounts the number of citations when I check for a single author. Any idea how I can update this query?
I have seen to idea of doing:
size((p:Quanta)-[:CITES]->(q))
which seems to get number of citations in general, but when I do
size((p:Quanta)-[:CITES]->(q) WHERE p.year <= 2019)
this doesn't seem to work syntactically.
Any suggestions would be greatly appreciated!
Upvotes: 0
Views: 454
Reputation: 67044
The main issue is that the following WITH
clause does not specify q
, and so q
is not bound to anything after that clause:
WITH a, count(q) as papers_written
Assuming Author
nodes have unique name
values, then this query should do what you expected:
MATCH (a:Author)-[:AUTHORED]->(q:Paper)
WHERE q.year <= 2008
OPTIONAL MATCH (q)<-[:CITES]-(p:Paper)
WHERE p.year <= 2008
RETURN a.name, COUNT(DISTINCT p) AS citations, COUNT(DISTINCT q) AS papers_written
Upvotes: 1