Tim Holdsworth
Tim Holdsworth

Reputation: 499

Neo4j How to Return the Top Nodes for each Property Value

I'm running PageRank on a group of nodes of type Paper, where each node has a property year. I am currently normalizing each PageRank score by year using the average and standard deviation of PageRank scores for all papers in that year.

I would like to return the top 100 papers (based on scaled PageRank values) for each year. Can I do this in a single query?

The query below calculates the scaled scores and returns the top 100 results overall, rather than the top 100 per year:

CALL algo.pageRank.stream(
  'MATCH (p:Paper) WHERE p.year < 2015 RETURN id(p) as id',
  'MATCH (p1:Paper)-[:CITES]->(p2:Paper) RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher', iterations:20, write:false, concurrency:20})
YIELD node, score
WITH 
  node.title AS title,
  node.year AS year, 
  score AS page_rank
ORDER BY page_rank DESC
LIMIT 100
WITH year, COLLECT({title: title, page_rank: page_rank}) AS data, AVG(page_rank) AS avg_page_rank, stDev(page_rank) as stdDev
UNWIND data AS d
RETURN year, d.title AS title, ABS(d.page_rank-avg_page_rank)/stdDev AS scaled_score;

Any suggestions would be greatly appreciated!

Upvotes: 1

Views: 591

Answers (1)

cybersam
cybersam

Reputation: 67044

Try this:

CALL algo.pageRank.stream(
  'MATCH (p:Paper) WHERE p.year < 2015 RETURN id(p) as id',
  'MATCH (p1:Paper)-[:CITES]->(p2:Paper) RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher', iterations:20, write:false, concurrency:20})
YIELD node, score
WITH 
  node.title AS title,
  node.year AS year, 
  score AS page_rank
ORDER BY page_rank DESC
WITH year, COLLECT({title: title, page_rank: page_rank})[..100] AS data, AVG(page_rank) AS avg_page_rank, stDev(page_rank) as stdDev
UNWIND data AS d
RETURN year, d.title AS title, ABS(d.page_rank-avg_page_rank)/stdDev AS scaled_score;

This query removes the LIMIT clause and instead keeps the top 100 (sorted) data items per year.

Upvotes: 4

Related Questions