Reputation: 598
I have a particular Cypher query that runs personalized PageRank over a set of source nodes. I want to RETURN
the top n
scoring nodes including their PageRank scores, additional properties, and all relationships between these nodes.
With the help of SO I've gotten to this point:
MATCH (p) WHERE p.paper_id IN $paper_ids
CALL algo.pageRank.stream(null, null, {direction: "BOTH", sourceNodes: [p]})
YIELD nodeId, score
WITH p, nodeId, score ORDER BY score DESC
LIMIT 25
MATCH (n) WHERE id(n) = nodeId
WITH collect(nodeId) as ids, collect(n {.*, score}) as nodes
CALL apoc.algo.cover(ids) YIELD rel
RETURN ids, nodes, collect(rel) as rels
The only problem I'm having is that this query returns duplicate nodes. For example, a node can be returned several times with different PageRank scores. I suspect this is due to having multiple source nodes, so the different scores correspond to the PageRank scores for each source node. There are no duplicates when PageRank is run for a single source node.
This is an issue because I want to RETURN
n
unique nodes (in the above code block n
= 25). In a typical run with two source nodes I get about 21-22 unique nodes.
How would I go about ensuring I RETURN
n
unique nodes?
Upvotes: 3
Views: 429
Reputation: 66947
As @Tezra says, you need to determine how you want to resolve multiple score
s for the same p
.
The following options all involve making a simple change to this clause:
WITH p, nodeId, score
Options:
Use the maximum score
:
WITH p, nodeId, MAX(score) AS score
Use the minimum score
:
WITH p, nodeId, MIN(score) AS score
Use the average score
:
WITH p, nodeId, AVG(score) AS score
Upvotes: 1