Tim Holdsworth
Tim Holdsworth

Reputation: 499

Find Average Number of Relationships with WHERE

I have a dataset with paper and author nodes. The relationships represent citations (paper to paper) and authorship (author to paper).

For all authors, I would like to calculate the number of papers they have written and the number of citations they received, in order to calculate the average number of citations per paper.

However, the paper nodes have a year attribute that I would like to filter on, so as to find the average number of citations per paper for an author in a given year.

That is to say, for an author, find the papers written before a certain date, find the number of papers citing these papers written before a certain date, and return the former divided by the latter as an average.

The code I have so far is:

MATCH (a:Author)-[:AUTHORED]->(q:Paper) WHERE q.year <= 2008 WITH a, count(q) as papers_written MATCH (p:Paper)-[:CITES]->(q) WHERE p.year <= 2008 WITH count(p) as citations, a, papers_written RETURN a.name, citations, papers_written

For some reason this drastically overcounts the number of citations when I check for a single author. Any idea how I can update this query?

I have seen to idea of doing: size((p:Quanta)-[:CITES]->(q)) which seems to get number of citations in general, but when I do size((p:Quanta)-[:CITES]->(q) WHERE p.year <= 2019) this doesn't seem to work syntactically.

Any suggestions would be greatly appreciated!

Upvotes: 0

Views: 454

Answers (1)

cybersam
cybersam

Reputation: 67044

The main issue is that the following WITH clause does not specify q, and so q is not bound to anything after that clause:

WITH a, count(q) as papers_written

Assuming Author nodes have unique name values, then this query should do what you expected:

MATCH (a:Author)-[:AUTHORED]->(q:Paper)
WHERE q.year <= 2008
OPTIONAL MATCH (q)<-[:CITES]-(p:Paper)
WHERE p.year <= 2008
RETURN a.name, COUNT(DISTINCT p) AS citations, COUNT(DISTINCT q) AS papers_written

Upvotes: 1

Related Questions