JC JL
JC JL

Reputation: 23

Search over big amount of nodes and ordering by path

I have a database with nodes of some kind of entities, one of them is User, each User has a property called idcountry in order to indicate the country of the user, I have created many relationship between my Users, but I am looking for make some suggestions to them, one of those suggestions is based on all the user in the same user country, but I want to put on the top of the list the users which are on the second grade, I wrote this query:

MATCH (a:`User`)
WHERE a.idcountry = 115
WITH COLLECT(a) AS nodes
UNWIND nodes AS node
OPTIONAL MATCH p = (me:`Usuario` {iduser: 7119046})-[*2..2]-(node)
RETURN node,
    CASE WHEN p IS NULL THEN 0 ELSE MIN(LENGTH(p)) END AS close_to
ORDER BY close_to DESC
LIMIT 25

The problem is that I have about of 11,000,000 of user in the country 115 (this property has a index), and this query takes 44759 ms to run, I guess that this query is scanning all the nodes on this country and then it's making the operation to determine the grade (close_to).

Exists some way to perform this type of queries with best performance?

Thank you

Upvotes: 2

Views: 40

Answers (1)

cybersam
cybersam

Reputation: 67019

[Edited]

You seem to have quite a few issues with your Cypher logic. For example:

  1. The following 2 clauses do a lot of work for no reason. You are putting all the matching User nodes in a collection, and then immediately breaking the new collection back into separate rows again. The nodes collection is not otherwise used.

     WITH COLLECT(a) AS nodes
     UNWIND nodes AS node
    
  2. You use [*2..2] to force p to only contain paths with a fixed length, and then you calculate the length of p if it is not NULL -- when you already know what the length has to be.

  3. You do not prevent the same node from showing up multiple times in your results. This is because your OPTIONAL MATCH result nodes will also appear as MATCH result nodes, and it is also possible for user 7119046 to be associated with the same node using multiple paths.

[Answer 1] If you are just looking for up to 25 nodes that are linked to you by a distance of 0, 1, or 2 relationships, ordered by decreasing path length, the following query should work for you. I presume that :Usuario(iduser) and :User(idcountry) are both indexed.

MATCH p = (me:`Usuario` {iduser: 7119046})-[*..2]-(node:User {idcountry: 115})
RETURN DISTINCT node
ORDER BY LENGTH(p) DESC
LIMIT 25

[Answer 2] If you are looking for up to 25 nodes that are either linked to you by a distance of exactly 2 steps, or NOT linked to you by a distance of exactly 2 steps, with the former appearing first in the results, the following query should work for you. The same indexing assumptions as above apply.

Notes: This query will actually return up to 50 rows, but you should ignore anything past the first 25 rows. Also, it is still possible for the same node to appear twice in the result (once from the first MATCH and once from the second).

MATCH (me:`Usuario` {iduser: 7119046})-[*2..2]-(n:User {idcountry: 115})
RETURN DISTINCT node
LIMIT 25
UNION
MATCH (me:`Usuario` {iduser: 7119046}), (node:User {idcountry: 115})
WHERE NOT (me)-[*2..2]-(node)
RETURN DISTINCT node
LIMIT 25

Upvotes: 2

Related Questions