Reputation: 83

Using CASE with MATCH

I have a simple query looking for a common node 'x' related to many starting nodes a,b,c. If there is no common node x for a,b,c then I want to compare just a,b and b,c for common nodes 'y' and 'z' respectively. This is how I imagine it should look... but of course does not work.

MATCH p1=(a)-[]->(x) WHERE a.n=1
WITH p1,a,x
MATCH p2=(b)-[]->(x) WHERE b.n=2
WITH p1,p2,a,b,x
MATCH p3=(c)-[]->(x) WHERE c.n=3
WITH p1,p2,p3,a,b,c,x
MATCH CASE WHEN x IS NULL THEN p4=(a)-[]->(y)<-[]-(b) WHERE a.n=1 AND b.n=2 END 
WITH p1,p2,p3,p4,a,b,c,x,y
MATCH CASE WHEN x IS NULL THEN p5=(b)-[]->(z)<-[]-(c) WHERE b.n=2 AND c.n=3 END
RETURN p1,p2,p3,p4,p5,x,y,z

I am looking at using CASE to reduce redundancy and speed up the query since there is no need to search in pairs a,b b,c if a common 'x' is found. How do you use CASE to create a MATCH. Perhaps there is a better way than using CASE?

QUESTION BACKGROUND

I have a Very large dataset that is fairly shallow averaging 15 nodes deep. I am looking for correlation between multiple inputs in the form of a common related node in the network. The inputs can number from 2 to 20 sequential values and their order is relevant in that they will tend to group together in correlation. Quite often all the inputs will have a common related node but if not i need to find other sequential groupings that have a common node i.e. inputs 1,2 | 2,3 | 3,4,5 etc. So it's a question of whether to start wide and work down or start with pairs comparing the next input one at a time to the results of the last. I was hoping to use CASE to make the decisions on what to group working through the inputs in the case when no resultant is found.

As far as structuring the network it's shallow with each node having minimum number of relations so adding meta nodes likely won't help. Combining input groups to have a single start node will only increase the number of start nodes exponentially, at this point there is one for each possible input - 10 million or so. Combining them will increase this exponentially so I thought it best to avoid since my understanding is the the finding the starting node is the most expensive part of the query. Sorry I'm bound not to talk about the business case.

Upvotes: 1

Answers (2)

Tezra

Reputation: 8833

For those without access to APOC, here is the plain Cypher version of Sams answer

// Get Start Nodes
MATCH (a {n: 1}), (b {n: 2}), (c {n: 3})

// Find common nodes for ab and cb
OPTIONAL MATCH (a)-->(ab)<--(b)
OPTIONAL MATCH (c)-->(cb)<--(b)

// Aggregate common nodes of ab and cb
WITH a, b, c, COLLECT(ab) AS ab, COLLECT(cb) AS cb

// Return with additional Aggregate of intersection of the lists ab and cb
RETURN a, b, c, filter(n in ab WHERE n IN cb) as abc, ab, cb

Upvotes: 2

cybersam

Reputation: 66999

This query should return collections of the x, y, and z values. If the xs collection is not empty, then you can choose to ignore the ys and zs collections.

MATCH (a {n: 1}), (b {n: 2})-->(w), (c {n: 3})
OPTIONAL MATCH pa=(a)-->(w)
WITH a, b, c, COLLECT(DISTINCT NODES(pa)[1]) AS ys
OPTIONAL MATCH pc=(c)-->(w)
WITH a, b, c, ys, COLLECT(DISTINCT NODES(pc)[1]) AS zs
RETURN a, b, c, apoc.coll.intersection(ys, zs) AS xs, ys, zs;

The interesting thing to note about your use case is that the b node is required to determine the contents of all 3 collections. So, the MATCH clause finds the nodes related to the b node, and limits the other (optional) matches to only consider those nodes -- this should speed up the query. The APOC function apoc.coll.intersection is used to intersect the ys and zs collections to get the xs collection.

[EDITED]

Instead of performing both COLLECT operations at the same time (after the second optional match), we perform the first COLLECT operation right after the first optional match so that we can avoid a cartesian product. This should speed up the query and reduce the memory requirement.

Upvotes: 2

Using CASE with MATCH

Answers (2)

Related Questions