Yuval
Yuval

Reputation: 317

Neo4j Cypher: Creating very large queries with UNION

I need to query multiple paths (to draw in the UI) from different start nodes in Neo4j (could be hundreds of paths), I'm using UNION to combine the queries but I suspect that it's not the most efficient way to retrieve the data (I'm also using indexes on the start nodes). When creating large queries, all kinds of nasty stuff happen.

The query looks something like this:

MATCH path = ((n:actor)-[*0..2]->(g)-[p:was_filmed_at]->(k:movie_set)-[l:located]->(i:state)) 
WHERE n.name=’john doe1' AND (g:acted_in OR g:produced) 
AND p.addDate >= '2017-12-10T00:00:00+00:00' 
AND p.addDate <= '2017-12-12T08:08:06+00:00' 
AND l.addDate >= '2017-12-10T00:00:00+00:00' 
AND l.addDate <= '2017-12-12T08:08:06+00:00' 
RETURN nodes(path), relationships(path) 
LIMIT 1000

UNION 

MATCH path = ((n:actor)-[*0..2]->(g)-[p:was_filmed_at]->(k:movie_set)-[l:located]->(i:state)) 
WHERE n.name=’john doe2' AND (g:acted_in OR g:produced) 
AND p.addDate >= '2017-12-10T00:00:00+00:00' 
AND p.addDate <= '2017-12-12T08:08:06+00:00' 
AND l.addDate >= '2017-12-10T00:00:00+00:00' 
AND l.addDate <= '2017-12-12T08:08:06+00:00' 
RETURN nodes(path), relationships(path) 
LIMIT 1000

UNION 

MATCH path = ((n:actor)-[*0..2]->(g)-[p:was_filmed_at]->(k:movie_set)-[l:located]->(i:state)) 
WHERE n.name=’john doe3' AND (g:acted_in OR g:produced) 
AND p.addDate >= '2017-12-10T00:00:00+00:00' 
AND p.addDate <= '2017-12-12T08:08:06+00:00' 
AND l.addDate >= '2017-12-10T00:00:00+00:00' 
AND l.addDate <= '2017-12-12T08:08:06+00:00' 
RETURN nodes(path), relationships(path) 
LIMIT 1000

This is a query with only 3 paths, and it does not scale well with hundreds. Is there a simple, more efficient way to do this?

Upvotes: 1

Views: 138

Answers (1)

Bruno Peres
Bruno Peres

Reputation: 16365

Since you are changing only the n.name comparison, you can rewrite your query using IN operator and pass the list of names as an array. This way:

MATCH path = ((n:actor)-[*0..2]->(g)-[p:was_filmed_at]->(k:movie_set)-[l:located]->(i:state))
WHERE n.name IN ['john doe1', 'john doe2', 'john doe3']
AND (g:acted_in OR g:produced)
AND '2017-12-10T00:00:00+00:00' <= p.addDate <= '2017-12-12T08:08:06+00:00'
AND '2017-12-10T00:00:00+00:00' <= l.addDate <= '2017-12-12T08:08:06+00:00'
RETURN nodes(path), relationships(path)
LIMIT 1000

Also you can chain inequalities like: AND '2017-12-10T00:00:00+00:00' <= p.addDate <= '2017-12-12T08:08:06+00:00'. Make sure you have an index on :actor(name) for quick index lookup of your starting node. (Thanks to @InverseFalcon)

Upvotes: 1

Related Questions