canbax
canbax

Reputation: 3856

"with" keyword changes the results

query 1

MATCH (n:PC)-[r:C1|C2|U]->(n2) WHERE r.date IS NULL OR (r.date > 1 AND r.date < 15)
WITH n,n2, 
reduce(t = 0, r in [x IN collect(r) WHERE type(x) = 'C1' or type(x) = 'C2'] | t + r.amount) as totalAmount1, 
reduce(t = 0, r in [x IN collect(r) WHERE type(x) = 'U' ] | t + r.amount) as totalAmount2 
WHERE totalAmount1 >= 1291 AND totalAmount2 >= 1000
RETURN COUNT(*)

query 2

MATCH (n:PC)-[r:C1|C2|U]->(n2) WHERE r.date IS NULL OR (r.date > 1 AND r.date < 15)
WITH n, 
reduce(t = 0, r in [x IN collect(r) WHERE type(x) = 'C1' or type(x) = 'C2'] | t + r.amount) as totalAmount1, 
reduce(t = 0, r in [x IN collect(r) WHERE type(x) = 'U' ] | t + r.amount) as totalAmount2 
WHERE totalAmount1 >= 1291 AND totalAmount2 >= 1000
RETURN COUNT(*)

As you can see, these 2 queries are VERY similar to each other. The only difference is I used WITH n,n2 instead of WITH n in the second query. I expect both of them to return THE SAME results.

BUT query 1 returns 0 the other returns 113. why? how?

Note: the database contains nearly 7 million nodes and 10 million edges.

Upvotes: 0

Views: 35

Answers (2)

Graphileon
Graphileon

Reputation: 5385

Your Query 1 reduces for each n only the paths that lead to a specific n2

Your Query 2 reduces for each n all paths that lead to any node

Hence, the number of paths, and thus of items in the collect(r) of Query 2 is greater than for Query 1

Assuming that the amounts are > 0, Query 2 has a higher count of results for which the WHERE clause returns true.

In fact, it's not the WITH keyword that changes the results, but it's the list of variables / placeholders that you provide.

Upvotes: 1

cybersam
cybersam

Reputation: 66989

You should read the documentation on how aggregating functions like COLLECT work, especially the information on "grouping keys".

All WITH or RETURN clause terms that do NOT use aggregating functions are used as "grouping keys", which control what data should be aggregated together.

Query 1 has 2 grouping keys (n and n2), but Query 2 only has 1 (n). So, you would generally expect the resulting aggregations to be different.

Upvotes: 1

Related Questions