Reputation: 651
I have the following relationships in my message graph:
(m:Message)-[:ORIGINATED_AT]->(i:IP)-[:IN_NETWORK]-(n:Network)
I am trying to find the IPs and Networks sending the most messages with category = 'spam'.
I have tried the following query:
// IPs sending the most spam:
match (ip:IpAddress)--(m:Message)
where toLower(m.category) contains 'spam'
with ip, size(()-[:ORIGINATED_AT]->(ip)) as volume
ORDER BY volume DESC LIMIT 10
match (ip)--(net:Network)
return ip, net
but it seems to find "networks that have sent the most messages with at least one spam message", but I want "networks that have sent the most spam". Also, the query returns less than the 10 expected results.
I removed the message variable to get the IPs sending the most messages, and this seems to work as expected:
// IPs sending the most messages:
match (ip:IpAddress)
with ip, size(()-[:ORIGINATED_AT]->(ip)) as volume
ORDER BY volume DESC LIMIT 10
match (ip)--(net:Network)
return ip, net
How can I modify my query to correctly count only the relationships where the related (m:Message) has category:'spam'?
Upvotes: 1
Views: 155
Reputation: 11216
I think you are on the right track, you are just overcomplicating it a little. You just need the count aggregation od every message that contains "spam" per IP address.
MATCH (ip:IpAddress)--(m:Message)
WHERE toLower(m.category) contains 'spam'
RETURN ip, count(m) AS spam_messages
ORDER BY spam_messages DESC LIMIT 10
To use size()
in this situation replace count()
with a different aggregation function, collect()
, which puts the elements in a list or collection instead of counting them. Then use size()
to return the number of items in the list. This approach would be useful if you needed the actual elements themselves for some kind of post processing afterwards.
MATCH (ip:IpAddress)--(m:Message)
WHERE toLower(m.category) contains 'spam'
WITH ip, collect(m) AS spam_messages
RETURN ip, size(spam_messages) as spam_messages_size
ORDER BY spam_messages_size DESC LIMIT 10
Upvotes: 1