Ant Kutschera
Ant Kutschera

Reputation: 6588

How to match Neo4j node or relationship count when filtering on node or relationship property

I have a graph:

CREATE (c1:Contract { id: "V-9087-4321" }),
       (c2:Contract { id: "V-8046-2304" }),
       (p:Partner {id: "C-4837-4536"}),
       (p)-[:POLICY_HOLER]->(c1),
       (p)-[:POLICY_HOLER]->(c2)

MATCH (p:Partner) WHERE p.id = "C-4837-4536"
CREATE (c:Claim { id: "ef70", date: "2019-04-27" }),
       (p)-[:CLAIMANT {on: "2019-04-27"}]->(c)

MATCH (p:Partner) WHERE p.id = "C-4837-4536"
CREATE (c:Claim { id: "ab90", date: "2019-04-28" }),
       (p)-[:CLAIMANT {on: "2019-04-28"}]->(c)

I want to find all partners who have more than one claim after April 1st:

I've tried this:

MATCH (claim:Claim)<-[r:CLAIMANT]-(p:Partner)
WITH count(r) as cnt, p, claim
WHERE cnt > 1 AND claim.date > '2019-04-01'
RETURN cnt, p.id

It doesn't work, because (if I remove the where clause), it is returning the two claims each with cnt=1, rather than actually aggregating them. I've tried selecting based on claim.date as well as r.on.

How can this be done with Cypher?

Upvotes: 0

Views: 469

Answers (2)

Rajendra Kadam
Rajendra Kadam

Reputation: 4052

@cybersam explained about aggregate functions very well. I suggest looking at their answer to understand how aggregate functions work. But their query is not filtering claims based on the count.

You can use the following query to find all partners who have more than one claim after April 1st:

MATCH (claim:Claim)<-[r:CLAIMANT]-(p:Partner)
WHERE claim.date > '2019-04-01'
WITH count(r) as cnt, p 
WHERE cnt > 1
RETURN cnt, p.id

Upvotes: 1

cybersam
cybersam

Reputation: 66989

The neo4j aggregating functions (like COUNT()) use the non-aggregating terms in the same WITH or RETURN clause as the "grouping keys".

In your example's WITH clause, the grouping keys are p and claim, so the COUNT() function is counting the number of r relationships between each distinct p and claim pair (and so the count will always be 1). What you actually want instead is the number of r relationships for each p.

Assuming every Partner node has a unique id, you can find all partners who have more than one claim after April 1st using this simplified query:

MATCH (claim:Claim)<-[:CLAIMANT]-(p:Partner)
WHERE claim.date > '2019-04-01'
RETURN COUNT(claim) AS cnt, p.id

If you also want to return a list of all the claims for each matching partner:

MATCH (claim:Claim)<-[:CLAIMANT]-(p:Partner)
WHERE claim.date > '2019-04-01'
RETURN COUNT(claim) AS cnt, COLLECT(claim) AS claims, p.id

In addition, faster searches you should create either an index or a uniqueness constraint on :Claim(date).

Upvotes: 1

Related Questions