Tim Holdsworth
Tim Holdsworth

Reputation: 499

How to Normalize PageRank Scores

I'm running PageRank on a group of nodes, where each node has a property year. How can I calculate the averages of all the PageRank scores depending on the year property? That is to say if there 100 nodes with a total of 20 different year values, I would like to calculate 20 average PageRank values.

Then, for each node, I'd like to calculate a scaled score based on the difference between the PageRank score and the average PageRank score of papers in that year (where the average for that year is based on the PageRank scores for all nodes with that same value for the year property.

The code to run PageRank is: CALL algo.pageRank.stream( 'MATCH (p:Paper) WHERE p.year < 2015 RETURN id(p) as id', 'MATCH (p1:Paper)-[:CITES]->(p2:Paper) RETURN id(p1) as source, id(p2) as target', {graph:'cypher', iterations:20, write:false, concurrency:20}) YIELD node, score WITH *, node.title AS title,
node.year AS year, score AS page_rank ORDER BY page_rank DESC LIMIT 10000 RETURN title, year, page_rank;

How can I alter this code to return scaled score?

Any help is greatly appreciated!

Upvotes: 0

Views: 1125

Answers (1)

cybersam
cybersam

Reputation: 66989

This query should return the scaled_score (as an absolute value) for each year/title combination (the lower the scaled score, the closer the title's page_rank is to the average for that year):

CALL algo.pageRank.stream(
  'MATCH (p:Paper) WHERE p.year < 2015 RETURN id(p) as id',
  'MATCH (p1:Paper)-[:CITES]->(p2:Paper) RETURN id(p1) as source, id(p2) as target',
  {graph:'cypher', iterations:20, write:false, concurrency:20})
YIELD node, score
WITH 
  node.title AS title,
  node.year AS year, 
  score AS page_rank
ORDER BY page_rank DESC
LIMIT 10000
WITH year, COLLECT({title: title, page_rank: page_rank}) AS data, AVG(page_rank) AS avg_page_rank
UNWIND data AS d
RETURN year, d.title AS title, ABS(d.page_rank-avg_page_rank)/avg_page_rank AS scaled_score;

You may also want to order the results (say, by year or scaled_score).

Upvotes: 1

Related Questions