neo4j - performance of query to match nodes with a common relationship

Question

I have created a database from twitter data and have a relationship between Users and Places like:

(:User)-[:WAS_AT]-> (p:Place)

There are 610.464 relationships of that type, between 59.257 Users and 823 Places.

I want to get all the users who were in the same place:

MATCH q=(u1:User)-[:WAS_AT]->(:Place)<-[:WAS_AT]-(u2:User)
RETURN q

That query has not finished after more than two hours, what I am doing wrong?

I tried adding an index to the users but that not improved the efficiency.

Thanks in advance,

cybersam · Accepted Answer

Your query is trying to get every distinct pair of visits to the same Place. So if there were N visits to a Place, you are trying to get N*(N-1) paths. And you are trying to do that for each and every Place.

What you actually want is to get is a list of distinct Users who visited the same Place (which will be at most N in size). Here is how you can do that:

MATCH (u:User)-[:WAS_AT]->(place:Place)
RETURN place, COLLECT(DISTINCT u) AS users

The DISTINCT option is only needed if a User can have multiple WAS_AT relationships to the same Place.

neo4j - performance of query to match nodes with a common relationship

Answers (1)

Related Questions