Reputation: 3962
Hi I have 2 sets of labels in neo4j 3.03:-
INTERACTIONS
uidpid 100000060085836_170782808933_10154454374183934
name Dean Hohaia
postid 170782808933_10154454374183934
pageid 170782808933
userid 100000060085836
POSTS
shares 0
comments 0
postid 100129044360_100138063361365
pageid 100129044360
type link
createdtime 2010-03-30 00:43:23
pagename Study in New Zealand
likes 4
I have a relationship called LIKES which has been created likes this:-
MATCH (i:interactions),(p:posts)
WHERE i.userid = p.userid
CREATE (i)-[:likes]->(p)
which look like this:
uidpid 613637235481924_125251397514429_1000501533322740
name Toth Mariann
postid 125251397514429_1000501533322740
pageid 125251397514429
userid 613637235481924
same as interactions basically.
I need to find a way to create a query that shows:-
for each pagename in posts, show count of userid interactions by pagename:-
Source Pagename Matched Pagename Userids count #
Air New Zealand Rialto Channel 12494
Air New Zealand RNZ 2979
Air New Zealand SKY TV 4651
In essence - for each pagename in posts, show the count of all other pages that each user has engaged with.
Do I need to create any other relationships to achieve this?
Here's the exact, example data I'm using as CSV's https://www.wetransfer.com/downloads/37e89c65f029344a2205ca717f04b6fe20161024051807/0d4ab3
Upvotes: 0
Views: 978
Reputation: 5057
First, as you mentioned we connect the interactions and the posts based on the postid
(1).
MATCH (i:interactions), (p:posts)
WHERE i.postid = p.postid
CREATE (i)-[:likes]->(p)
Then we create a node for each user (2):
MATCH (i:interactions)
WITH DISTINCT i.userid AS userid
CREATE (u:user {userid: userid})
And connect them to the interactions (3):
MATCH (u:user), (i:interactions)
WHERE u.userid = i.userid
CREATE (u)-[:performed]->(i)
It's possible to perform these two CREATE
operations (2 and 3) with a single MERGE
but the performance seems to be much worse - not sure why.
MATCH (i:interactions)
MERGE (u:users {userid: i.userid})-[:performed]->(i)
Having created the likes
and performed
relationships, we can now formulate the query like this (4):
MATCH (source:posts)<-[:likes]-(:interactions)<-[:performed]-(:users)-[:performed]->(:interactions)-[:likes]->(matched:posts)
RETURN source.pagename, matched.pagename, COUNT(matched)
LIMIT 10
Warning: this took two minutes to run on my laptop (late-2011 quad-core i7 CPU + SSD).
The query starts from a post (source
), and navigates through likes
and performed
edges to each user that performed the interaction. It then navigates to those users' other interactions (again, through likes
and performed
edges), which ends in a node representing a post (matched
). The number of matched
nodes is aggregated with the COUNT
method and returned, along with the pagename properties.
A related suggestion: label names should start with an uppercase letter and should be singular, i.e. Post
, Interaction
and User
.
Upvotes: 1