Zaheer
Zaheer

Reputation: 2872

Neo4j WITH statement returning incorrect value

I have the following query I'm running in the browser client:

MATCH (m:Media)-[r:WITH]->(:Tag {name:'peripheral'})
WITH count(r) as totes, m
WHERE m-[:WITH]->(:Tag {name: 'target'})
RETURN COUNT(*), totes, count(*)/(totes*1.0)

I am trying to find the percentage of Media nodes with Tag 'target' out of all the Media nodes with Tag 'peripheral'.

Ex: # of Media nodes w/ 'target' & 'peripheral" / # of Media nodes w/ just 'peripheral'

Most of the query works except for the variable totes which always returns 1. If I separate the query just keeping the first half, the count is correct - it is only when I use the WITH and subsequent WHERE clause that the count is messed up. Any ideas why totes is incorrect?

Upvotes: 0

Views: 89

Answers (2)

Dave Bennett
Dave Bennett

Reputation: 11216

Your query is incorrect for what you want to do. Your query only matches Media with relationships to Tag nodes named peripheral where it also matches Tag nodes named target.

I believe what you really want is something more along the lines of this...

MATCH (m:Media)-[r:WITH]->(:Tag {name:'Peripheral'})
WITH count(r) as totes, collect(m) as mcol
unwind mcol as m
MATCH m-[:WITH]->(:Tag {name: 'Target'})
RETURN COUNT(*) as both, totes, count(*)/(totes*1.0)

Upvotes: 1

ceej
ceej

Reputation: 1893

My take on this is that your query is returning a row for each matched media node and hence the count of totes will always be 1. To carry the total count forward you need to aggregate the returned nodes - hence using COLLECT and then UNWIND. Dave Bennett is correct although I believe that his second suggestion is what you need. To break it down:

This line gets your initial pattern match.

MATCH (m:Media)-[r:WITH]->(:Tag {name:'Peripheral'})

Now you need to return your total count and carry forward a collection of the matched media nodes. Just carrying m forward results in the issue you have, i.e. getting a count of 1 for each media node you match

WITH count(r) as totes, collect(m) as mcol

Now you can unwind the collection of media nodes that you carried forward to transform it back into individual rows that you can use to match a subset of those media nodes that have the tag target

UNWIND mcol as media
MATCH media-[:WITH]->(:Tag {name: 'Target'})

Now you should have all the necessary info to get the result that you desire, i.e. the percentage of Media nodes with Tag 'target' out of all the Media nodes with Tag 'peripheral'

RETURN COUNT(*) as both, totes, COUNT(*)/(totes*1.0)

For other readers it may be worth pointing out that (totes*1.0) is necessary to force one of the arguments to be a float. If this is not done then integer maths will take place and you will get 0 or 1! I also find it more expressive to indicate the exact expression that I am aggregating, so the return statement would become

RETURN COUNT(media) as both, totes, COUNT(media)/(totes*1.0)

and to express an actual percentage you could use

RETURN COUNT(media) as both, totes, COUNT(media)/(totes*1.0)*100

The only other thing would be to pass forward the second count explicitly

WITH COUNT(media) as both, totes
RETURN both, totes, both/(totes*1.0)*100

So to bring it all together it becomes

MATCH (m:Media)-[r:WITH]->(:Tag {name:'Peripheral'})
WITH count(r) as totes, collect(m) as mcol
UNWIND mcol as media
MATCH media-[:WITH]->(:Tag {name: 'Target'})
WITH COUNT(media) as both, totes
RETURN both, totes, both/(totes*1.0)*100

Not trying to take anything away from the previous answer - just expanding on it as I find sometimes it is useful to know why as well as how. So this is for my own benefit and future reference! :)

Upvotes: 1

Related Questions