Reputation: 285
I have a performance critical application which has to match multiple nodes to another node based on regex matching. My current query is as follows:
MATCH (person: Person {name: 'Mark'})
WITH person
UNWIND person.match_list AS match
MATCH (pet: Animal)
WHERE pet.name_regex =~ match
MERGE (person)-[:OWNS_PET]->(pet)
RETURN pet
However, this query runs VERY slow (around 500ms on my workstation). The graph contains around 500K nodes, and around 10K will match the regex.
I'm wondering whether there is a more efficient way to re-write this query to work the same but provide a performance increase.
EDIT:
When I run this query on several Persons multithreaded I get a TransientError
exception
neo4j.exceptions.TransientError: ForsetiClient[3] can't acquire ExclusiveLock{owner=ForsetiClient[14]} on NODE(1889), because holders of that lock are waiting for ForsetiClient[3].
EDIT 2:
Person:name
is unique and indexed
Animal:name_regex
is not indexed
Upvotes: 0
Views: 206
Reputation: 8833
First, I would start by simplifying your query as much as possible. The way you are doing it now creates a lot of wasted effort after a match has been found
MATCH (person: Person {name: 'Mark'}), (pet: Animal)
WHERE ANY(match in person.match_list WHERE pet.name_regex =~ match)
MERGE (person)-[:OWNS_PET]->(pet)
RETURN pet
This will make it so that only 1 merge is attempted if there are multiple matches, and once one match is found, the rest won't be attempted on the same pet. This also allows Cypher to optimize to the best of it's ability on your data.
To improve the cypher further, you will need to optimize your data. For example, regex match is expensive (requires a node+string scan), if the match statements can be largely reused between people, it would be better to break them out into a node, and then connect to those so that the work of one regex match can be reused everywhere it's repeated.
Upvotes: 2