Reputation: 1332
I have a large graph in which there are nodes representing people. All of them have firstname and surname properties, some have middlename properties. I'm looking for nodes that might represent the same person, so am looking at the different permutations of names. I'm currently comparing surnames and the first initial of firstnames [ some nodes just have initials ], but can't figure out how to test middlenames if they exist.
My current query is:
match (a:Author), (b:Author)
where
a.surname=b.surname and
( a.firstname starts with 'A' and b.firstname starts with 'A')
return distinct a,b
My understanding is that OPTIONAL MATCH refers only to patterns, so that won't work. I can't find a way to write an if statement that makes sense.
It may be that it makes more sense for me to do this programmatically, rather than relying just on direct Cypher queries, but I was hoping to keep it really simple and just do it in Cypher.
Some examples to clarify what I want to do.
Example 1:
Node 1: firstname "John" middlename "Patrick" lastname "Smith"
Node 2: firstname "J" middlename "P" lastname "Smith"
Node 3: firstname "J" middlename "Q" lastname "Smith"
Node 4: firstname "J" lastname "Smith"
I want a query that will return nodes 1, 2, and 4 as 'matching'.
Example 2:
Node 1: firstname "Jane" lastname "Smith"
Node 2: firstname "J" middlename "P" lastname "Smith"
Node 3: firstname "J" middlename "Q" lastname "Smith"
Node 4: firstname "J" lastname "Smith"
Here, I want all 4 nodes, since the 'canonical' name doesn't have a middle name.
Upvotes: 3
Views: 1436
Reputation: 3119
I think you need something like the following:
match (a:Author), (b:Author)
where
id(a) < id(b) and
( a.surname=b.surname) and
( a.firstname starts with 'A' and b.firstname starts with 'A') and
( a.middlename=b.middlename OR a.middlename IS NULL OR b.middlename IS NULL)
return a,b
How to work with null is a good reference for puzzles like the one you're dealing with.
EDIT: Let's break it down with some pseudocode:
if (a.middlename is null) return true;
if (b.middlename is null) return true;
if (a.middlename is not null and b.middlename is not null and a.middlename!=b.middlename) return false;
if (a.middlename is not null and b.middlename is not null and a.middlename=b.middlename) return true;
Upvotes: 3