Reputation: 488
Consider the following schema, where orange nodes are of type Person
and brown nodes are of type Movie
. (This is from the "movies" dataset that is shipped with Neo4j).
The query that I am trying to write goes as follows:
Find all reviewer pairs, one following the other, and return the names of the two reviewers. If they have both reviewed the same movie, return the title of the movie as well. Restrict the query so that the first letter of the name of both reviewers is ’J’
Now, consider the following CYPHER query:
MATCH (a:Person)-[:REVIEWED]->(:Movie),
(b:Person)-[:REVIEWED]->(:Movie),
(a:Person)-[:FOLLOWS]->(b:Person)
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
RETURN DISTINCT a.name, b.name, m.title
This returns the following (incorrect) results:
Why?
What I've gathered so far:
WHERE
applies to the (OPTIONAL
) MATCH
directly preceding itWHERE
constraints are considered while looking for matches, not afterwards.OPTIONAL MATCH
does not apply fully, null
is put for the missing parts of the patternI still don't understand, why "Angela Scope" shows up in the results. In any case, if the predicates should forbid it to ever show up.
PS: I am aware that the following query returns the correct results
MATCH (a:Person)-[:REVIEWED]->(:Movie),
(b:Person)-[:REVIEWED]->(:Movie),
(a:Person)-[:FOLLOWS]->(b:Person)
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
RETURN DISTINCT a.name, b.name, m.title
however, I'd like to find out why these two queries return different results and especially why the one mentioned first returns exactly this result.
Upvotes: 2
Views: 4242
Reputation: 30397
Sure, you're almost at the answer already:
the WHERE applies to the (OPTIONAL) MATCH directly preceding it
This is important. You should not view the WHERE clause as independent, as it is associated with and modifies the preceding clause. So read it out like MATCH ... WHERE ...
and OPTIONAL MATCH ... WHERE ...
and WITH ... WHERE ...
as a whole.
Remember that an OPTIONAL MATCH will never filter out rows. It will keep existing rows, and for any newly introduced variables, will try to find matches using the pattern provided that passes its WHERE clause. If it doesn't find matches, newly introduced variables will be set to null. And again...no filtering.
So for this snippet:
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
Angela Scope and Jessica Thompson have a follows relationship between them, and they have reviewed the same movie, The Replacements, but they fail the WHERE clause, since Angela's name doesn't start with a 'J'. Therefore the OPTIONAL MATCH didn't find anything, so the newly introduced variable m
will come back as null
. Nothing will be filtered.
In order to have a predicate filter your rows, the WHERE clause needs to be associated with a MATCH, or a WITH. So we could fix it as in the correct query you added later, or like this:
MATCH (a:Person)-[:REVIEWED]->(:Movie),
(b:Person)-[:REVIEWED]->(:Movie),
(a:Person)-[:FOLLOWS]->(b:Person)
OPTIONAL MATCH (a:Person)-[:REVIEWED]->(m:Movie)<-[:REVIEWED]-(b:Person)
WITH a, m, b
WHERE a.name STARTS WITH 'J'
AND b.name STARTS WITH 'J'
RETURN DISTINCT a.name, b.name, m.title
And this is less efficient since the filtering happens after we've done the OPTIONAL MATCH. Better to filter earlier, so we only execute the OPTIONAL MATCH when we already have our filtered results.
Also to note, you have an issue with duplicates here due to your matching of these patterns at the start: (a:Person)-[:REVIEWED]->(:Movie)
. While this does indeed find persons who are reviewers, you will get a row per path that matches the pattern...so for Jessica Thompson, for example, you can see she has reviewed 2 movies, so there are two paths that match that pattern, which is why she's showing up at least twice per other reviewer in your results (and it will be multiplicative, depending on the number of movies the other reviewer has reviewed.
To fix this, instead of looking for all paths of a :Person reviewing a :Movie, look for a :Person where they have reviewed a movie:
MATCH (a:Person)
WHERE (a)-[:REVIEWED]->()
Because the pattern becomes a predicate, Cypher only has to find at least one :REVIEWED relationship from a :Person, and then it can stop looking, and you won't have those duplicate results.
Upvotes: 8