Sarath Babu
Sarath Babu

Reputation: 13

Neo4j query with Boolean operations between Nodes

We have a graph DB with nodes: Resume and Skill. Skill node contains property - "Name". It has sample Values like Java, C ...

And we created relationship has_skill between Resume and Skill Nodes. And each Resume node can contain any number of Skill nodes with relationship has_skill.

Now, we want to retrieve all resume nodes which satisfies following boolean condition.

((JAVA AND MYSQL) OR (C AND MSSQL)) AND HTML 

We have tried the following query

match(n:Skill)--(n1:PannaResume)
where ((n.name contains "java" AND n.name contains "mysql")
OR (n.name contains "c" AND n.name contains "mssql")) 
AND n.name contains "html"
return n1

But this is not giving the expected result. Please suggest correct query.

Upvotes: 1

Views: 522

Answers (2)

Tezra
Tezra

Reputation: 8833

You're query is failing because you are asking Neo4j to find a person, who has a skill, that is at least 3 skills simultaneously. Obviously, given skill n, it will never be java and mysql and html. (I assume you are using contains to 'strip' any version information they may have added to the skill name, like javaee or java 8. Just remember this will also match javascript. I would recommend however you are getting this data into Neo4j, to have it parse the general name out to a separate field. If you are treating name as a skill array, then you probably have no matches, and your scheme is too fragile to do anything useful with)

I would recommend this match pattern so that Neo4j checks for any valid skill node

MATCH (n:Skill)<--(n1:PannaResume)
WITH n1, COLLECT(n.name) as skills
WHERE ANY(name in skills WHERE name contains "html")
AND (ANY(name in skills WHERE name contains "java")
 OR ANY(name in skills WHERE name contains "mysql")
)
OR (
(ANY(name in skills WHERE name contains "c")
 OR ANY(name in skills WHERE name contains "mssql")
))

of course, more explicit is better. This will perform much better.

MATCH (n1:PannaResume)-[:has_skill]->(:Skill {name : "html"})
WHERE ((n1)-[:has_skill]->(:Skill {name : "java"}) AND (n1)-[:has_skill]->(:Skill {name : "mysql"}))
OR ((n1)-[:has_skill]->(:Skill {name : "c"}) AND (n1)-[:has_skill]->(:Skill {name : "mssql"}))
RETURN n1

But you can also simplify a bit (and make it easier on the Cypher planner) by separating your query into a set of distinct queries, and union the results like this

MATCH (n1:PannaResume)-[:has_skill]->(:Skill {name : "html"})
WHERE (n1)-[:has_skill]->(:Skill {name : "java"}) 
  AND (n1)-[:has_skill]->(:Skill {name : "mysql"})
RETURN n1
UNION
MATCH (n1:PannaResume)-[:has_skill]->(:Skill {name : "html"})
WHERE (n1)-[:has_skill]->(:Skill {name : "c"})
  AND (n1)-[:has_skill]->(:Skill {name : "mssql"})
RETURN n1

This should be equivalent to the previous queries, but may be more efficient depending on what you are doing/Neo4j version. The WITH will break the query into different stages, so will force the Cypher planner to find the skill nodes first, and then find a person connected to those skills.

MATCH (html:Skill {name : "html"}), (java:Skill {name : "java"}), (c:Skill {name : "c"}), (mysql:Skill {name : "mysql"}), (mssql:Skill {name : "mssql"})
WITH *
MATCH (n1:PannaResume)-[:has_skill]->(html)
WHERE ((n1)-[:has_skill]->(java) AND (n1)-[:has_skill]->(mysql)
OR ((n1)-[:has_skill]->(c) AND (n1)-[:has_skill]->(mssql)
RETURN n1

If you need to worry about efficiency, you should experiment with different quires using the PROFILE keyword. The key thing you are trying to do is minimize how many rows each 'stage' of the query produces, as every stage after that will be multiplied by the number of rows the previous stage produced. (The Cypher planner can choose to perform multiple queries in parallel, and then find the intersection of them. So it's not obvious how alteration to a Cypher will affect the plan. PROFILE will show you inefficiencies in your query, so take note of steps with high dbhits or result in a lot of rows.)

Upvotes: 0

Bruno Peres
Bruno Peres

Reputation: 16365

You can try checking by the pattern existence:

match (n1:PannaResume)-[:has_skill]->(n:Skill)
where ((n1)-[:has_skill]->(:Skill {name : "java"}) and (n1)-[:has_skill]->(:Skill {name : "mysql"}))
OR ((n1)-[:has_skill]->(:Skill {name : "c"}) and (n1)-[:has_skill]->(:Skill {name : "mssql"}))
AND (n1)-[:has_skill]->(:Skill {name : "HTML"})
RETURN n1

Upvotes: 0

Related Questions