Reputation: 739
So, I have some user event data and would like to create a graph of the same. A snapshot of the data
Now, the _id
col has duplicate records but they are actually the same person, however there are multiple sessionField
records for the same _id
What I'd want is something like this:
Node A -> sessionNode a1 -> Action Node a11 (with event type as properties, 4 in this case)
-> sessionNode a2 -> Action Node a21 (with event type as properties, 2 in this case)
Node B -> sessionNode b1 -> Action Node b11 (with event type as properties, 3 in this case)
I've tried the following code, but being new to graphs I'm not able to replicate the same:
session_streams_y
has same data like _id
LOAD CSV WITH HEADERS FROM 'file:///df_temp.csv' AS users
CREATE (p:Person {nodeId: users._id, sessionId: users.session_streams_y})
CREATE (sn:Session {sessId: users.sessionField, sessionId: users.session_streams_y})
MATCH (p:Person)
with p as ppl
MATCH (sn:Session)
WITH ppl, sn as ss
WHERE ppl.sessionId=ss.sessionId
MERGE (ppl)-[:Sessions {sess: 'Has Sessions'}]-(ss)
WITH [ppl,ss] as ns
CALL apoc.refactor.mergeNodes(ns) YIELD node
RETURN node
This gives something different
Upvotes: 0
Views: 41
Reputation: 66967
Something like this may work for you:
LOAD CSV WITH HEADERS FROM 'file:///df_temp.csv' AS row
MERGE (p:Person {id: row._id})
MERGE (s:Session {id: row.sessionField})
FOREACH(
x IN CASE WHEN s.eventTypes IS NULL OR NOT row.eventType IN s.eventTypes THEN [1] END |
SET s.eventTypes = COALESCE(s.eventTypes, []) + row.eventType)
MERGE (p)-[:HAS_SESSION]->(s)
RETURN p, s
The resulting Person
and Session
nodes would be unique, each Session
node would have an eventTypes
list with distinct values, and the appropriate Person
and Session
nodes would be connected by a HAS_SESSION
relationship.
An Action
node does not seem to be necessary.
Upvotes: 1