iouri
iouri

Reputation: 2929

NEO4J Query to exclude some nodes

With a data model that looks similar to this:

(u:User)-[:POSTED]->(p:Post {created_at: 123)-[:ABOUT]->(t:Topic {name: "Blue")

What is the best way to find distinct count of users who posted/created a post with {created_at: 123} AND also don't have a post with {created_at: 124} about topic "Blue".

Closest I can get is to collect ids and then exclude them but that doesn't scale when you have a lot of nodes (millions).

[EDITED]

I also need the created_at times to be specifiable as ranges.

Upvotes: 0

Views: 151

Answers (2)

cybersam
cybersam

Reputation: 67009

This query allows you to specify created_at ranges. In this example, the desirable range is [123..130], and the undesirable "Blue" range is [131..140]. In your actual query, the range endpoints should be specified by parameters.

MATCH (user:User)-[:POSTED]->(p1:Post)
WHERE 123 <= p1.created_at <= 130
WITH user
OPTIONAL MATCH (user)-[:POSTED]->(p2:Post)-[:ABOUT]->(:Topic{name:"Blue"})
WHERE 131 <= p2.created_at <= 140
WITH user, p2
WHERE p2 IS NULL
RETURN COUNT(DISTINCT user) AS userCount;

The OPTIONAL MATCH clause is there to match the undesirable "Blue" paths, and the WHERE p2 IS NULL clause will filter out user nodes that have any such paths.

Upvotes: 1

InverseFalcon
InverseFalcon

Reputation: 30397

Assuming you have an index on Post.created_at and Topic.name (for speed), this should work:

MATCH (user:User)-[:POSTED|CREATED]->(:Post{created_at:123})
WHERE NOT EXISTS ( (user)-[:POSTED|CREATED]->(:Post{created_at:124})-[:ABOUT]->(:Topic{name:"Blue"}) )
RETURN count(DISTINCT user) as userCount

It's worth profiling that query, and if it isn't using the created_at and name indexes, supply that to the query with USING INDEX after your MATCH.

Upvotes: 0

Related Questions