Thiago Negri
Thiago Negri

Reputation: 5351

Select nodes that has all relationships in Neo4j

Suppose I have two kinds of nodes, Person and Competency. They are related by a KNOWS relationship. For example:

(:Person {id: 'thiago'})-[:KNOWS]->(:Competency {id: 'neo4j'})

How do I query this schema to find out all Person that knows all nodes of a set of Competency?

Suppose that I need to find every Person that knows "java" and "haskell" and I'm only interested in the nodes that knows all of the listed Competency nodes.

I've tried this query:

match (p:Person)-[:KNOWS]->(c:Competency) where c.id in ['java','haskell'] return p.id;

But I get back a list of all Person that knows either "java" or "haskell" and duplicated entries for those who knows both.

Adding a count(c) at the end of the query eliminates the duplicates:

match (p:Person)-[:KNOWS]->(c:Competency) where c.id in ['java','haskell'] return p.id, count(c);

Then, in this particular case, I can iterate the result and filter out results that the count is less than two to get the nodes I want.

I've found out that I could do it appending consecutive match clauses to keep filtering the nodes to get the result I want, in this case:

match (p:Person)-[:KNOWS]->(:Competency {id:'haskell'})
match (p)-[:KNOWS]->(:Competency {id:'java'})
return p.id;

Is this the only way to express this query? I mean, I need to create a query by concatenating strings? I'm looking for a solution to a fixed query with parameters.

Upvotes: 1

Views: 712

Answers (4)

Dave Bennett
Dave Bennett

Reputation: 11216

How about this...

WITH ['java','haskell'] AS comp_col
MATCH (p:Person)-[:KNOWS]->(c:Competency)
WHERE c.name in comp_col
WITH comp_col
, p
, count(*) AS total
WHERE total = length(comp_col)
RETURN p.name, total
  1. Put the competencies you want in a collection.
  2. Match all the people that have either of those competencies
  3. Get the count of compentencies by person where they have the same number as in the competency collection from the start

I think this will work for what you need, but if you are building these queries programatically the best performance you get might be with successive match clauses. Especially if you knew which competencies were most/least common when building your queries, you could order the matches such that the least common were first and the most common were last. I think that would chunk down to your desired persons the fastest.

It would be interesting to see what the plan analyzer in the sheel says about the different approaches.

Upvotes: 1

Michael Hunger
Michael Hunger

Reputation: 41706

with ['java','haskell'] as skills
match (p:Person)-[:KNOWS]->(c:Competency) 
where c.id in skills 
with p.id, count(*) as c1 ,size(skills) as c2
where c1 = c2
return p.id

Upvotes: 2

Bossie
Bossie

Reputation: 3612

Untested, but this might do the trick:

match (p:Person)-[:KNOWS]->(c:Competency)
with p, collect(c.id) as cs
where all(x in ['java', 'haskell'] where x in cs)
return p.id;

Upvotes: 2

Christophe Willemsen
Christophe Willemsen

Reputation: 20185

One thing you can do, is to count the number of all skills, then find the users that have the number of skill relationships equals to the skills count :

MATCH (n:Skill) WITH count(n) as skillMax
MATCH (u:Person)-[:HAS]->(s:Skill) 
WITH u, count(s) as skillsCount, skillMax
WHERE skillsCount = skillMax
RETURN u, skillsCount

Chris

Upvotes: 2

Related Questions