Jonathan Lyon
Jonathan Lyon

Reputation: 3962

Find, group by and count relationships in Neo4J using cypher

Hi I have 2 sets of labels in neo4j 3.03:-

INTERACTIONS

uidpid  100000060085836_170782808933_10154454374183934
name    Dean Hohaia
postid  170782808933_10154454374183934
pageid  170782808933
userid  100000060085836

POSTS

shares      0
comments    0
postid      100129044360_100138063361365
pageid      100129044360
type        link
createdtime 2010-03-30 00:43:23
pagename    Study in New Zealand
likes       4

I have a relationship called LIKES which has been created likes this:-

MATCH (i:interactions),(p:posts)
WHERE i.userid = p.userid
CREATE (i)-[:likes]->(p)

which look like this:

uidpid  613637235481924_125251397514429_1000501533322740
name    Toth Mariann
postid  125251397514429_1000501533322740
pageid  125251397514429
userid  613637235481924

same as interactions basically.

I need to find a way to create a query that shows:-

for each pagename in posts, show count of userid interactions by pagename:-

Source Pagename  Matched Pagename   Userids count #
Air New Zealand  Rialto Channel     12494
Air New Zealand  RNZ                2979
Air New Zealand  SKY TV             4651

In essence - for each pagename in posts, show the count of all other pages that each user has engaged with.

Do I need to create any other relationships to achieve this?

Here's the exact, example data I'm using as CSV's https://www.wetransfer.com/downloads/37e89c65f029344a2205ca717f04b6fe20161024051807/0d4ab3

Upvotes: 0

Views: 978

Answers (1)

Gabor Szarnyas
Gabor Szarnyas

Reputation: 5057

First, as you mentioned we connect the interactions and the posts based on the postid (1).

MATCH (i:interactions), (p:posts)
WHERE i.postid = p.postid
CREATE (i)-[:likes]->(p)

Then we create a node for each user (2):

MATCH (i:interactions)
WITH DISTINCT i.userid AS userid
CREATE (u:user {userid: userid})

And connect them to the interactions (3):

MATCH (u:user), (i:interactions)
WHERE u.userid = i.userid
CREATE (u)-[:performed]->(i)

It's possible to perform these two CREATE operations (2 and 3) with a single MERGE but the performance seems to be much worse - not sure why.

MATCH (i:interactions)
MERGE (u:users {userid: i.userid})-[:performed]->(i)

Having created the likes and performed relationships, we can now formulate the query like this (4):

MATCH (source:posts)<-[:likes]-(:interactions)<-[:performed]-(:users)-[:performed]->(:interactions)-[:likes]->(matched:posts)
RETURN source.pagename, matched.pagename, COUNT(matched)
LIMIT 10

Warning: this took two minutes to run on my laptop (late-2011 quad-core i7 CPU + SSD).

The query starts from a post (source), and navigates through likes and performed edges to each user that performed the interaction. It then navigates to those users' other interactions (again, through likes and performed edges), which ends in a node representing a post (matched). The number of matched nodes is aggregated with the COUNT method and returned, along with the pagename properties.

A related suggestion: label names should start with an uppercase letter and should be singular, i.e. Post, Interaction and User.

Upvotes: 1

Related Questions