Reputation: 499
I'm running PageRank on a group of nodes, where each node has a property year
. How can I calculate the averages of all the PageRank scores depending on the year
property? That is to say if there 100 nodes with a total of 20 different year
values, I would like to calculate 20 average PageRank values.
Then, for each node, I'd like to calculate a scaled score based on the difference between the PageRank score and the average PageRank score of papers in that year (where the average for that year is based on the PageRank scores for all nodes with that same value for the year
property.
The code to run PageRank is:
CALL algo.pageRank.stream(
'MATCH (p:Paper) WHERE p.year < 2015 RETURN id(p) as id',
'MATCH (p1:Paper)-[:CITES]->(p2:Paper) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', iterations:20, write:false, concurrency:20})
YIELD node, score
WITH
*,
node.title AS title,
node.year AS year,
score AS page_rank
ORDER BY page_rank DESC
LIMIT 10000
RETURN
title,
year,
page_rank;
How can I alter this code to return scaled score?
Any help is greatly appreciated!
Upvotes: 0
Views: 1125
Reputation: 66989
This query should return the scaled_score
(as an absolute value) for each year
/title
combination (the lower the scaled score, the closer the title's page_rank
is to the average for that year):
CALL algo.pageRank.stream(
'MATCH (p:Paper) WHERE p.year < 2015 RETURN id(p) as id',
'MATCH (p1:Paper)-[:CITES]->(p2:Paper) RETURN id(p1) as source, id(p2) as target',
{graph:'cypher', iterations:20, write:false, concurrency:20})
YIELD node, score
WITH
node.title AS title,
node.year AS year,
score AS page_rank
ORDER BY page_rank DESC
LIMIT 10000
WITH year, COLLECT({title: title, page_rank: page_rank}) AS data, AVG(page_rank) AS avg_page_rank
UNWIND data AS d
RETURN year, d.title AS title, ABS(d.page_rank-avg_page_rank)/avg_page_rank AS scaled_score;
You may also want to order the results (say, by year
or scaled_score
).
Upvotes: 1