Jens Ehrich
Jens Ehrich

Reputation: 651

Return node with most connections to other nodes having a specific property

I have the following relationships in my message graph:

(m:Message)-[:ORIGINATED_AT]->(i:IP)-[:IN_NETWORK]-(n:Network)

I am trying to find the IPs and Networks sending the most messages with category = 'spam'.

I have tried the following query:

// IPs sending the most spam:
match (ip:IpAddress)--(m:Message)
where toLower(m.category) contains 'spam'
with ip, size(()-[:ORIGINATED_AT]->(ip)) as volume
ORDER BY volume DESC LIMIT 10
match (ip)--(net:Network)
return ip, net

but it seems to find "networks that have sent the most messages with at least one spam message", but I want "networks that have sent the most spam". Also, the query returns less than the 10 expected results.

I removed the message variable to get the IPs sending the most messages, and this seems to work as expected:

// IPs sending the most messages:
match (ip:IpAddress)
with ip, size(()-[:ORIGINATED_AT]->(ip)) as volume
ORDER BY volume DESC LIMIT 10
match (ip)--(net:Network)
return ip, net

How can I modify my query to correctly count only the relationships where the related (m:Message) has category:'spam'?

Upvotes: 1

Views: 155

Answers (1)

Dave Bennett
Dave Bennett

Reputation: 11216

I think you are on the right track, you are just overcomplicating it a little. You just need the count aggregation od every message that contains "spam" per IP address.

MATCH (ip:IpAddress)--(m:Message)
WHERE toLower(m.category) contains 'spam'
RETURN ip, count(m) AS spam_messages
ORDER BY spam_messages DESC LIMIT 10

To use size() in this situation replace count() with a different aggregation function, collect(), which puts the elements in a list or collection instead of counting them. Then use size() to return the number of items in the list. This approach would be useful if you needed the actual elements themselves for some kind of post processing afterwards.

MATCH (ip:IpAddress)--(m:Message)
WHERE toLower(m.category) contains 'spam'
WITH ip, collect(m) AS spam_messages
RETURN ip, size(spam_messages) as spam_messages_size
ORDER BY spam_messages_size DESC LIMIT 10

Upvotes: 1

Related Questions