Neo4j Cypher: Unable to define relationships with duplicate records

Question

So, I have some user event data and would like to create a graph of the same. A snapshot of the data

Now, the _id col has duplicate records but they are actually the same person, however there are multiple sessionField records for the same _id

What I'd want is something like this:

Node A -> sessionNode a1 -> Action Node a11 (with event type as properties, 4 in this case)
       -> sessionNode a2 -> Action Node a21 (with event type as properties, 2 in this case)
Node B -> sessionNode b1 -> Action Node b11 (with event type as properties, 3 in this case)

I've tried the following code, but being new to graphs I'm not able to replicate the same:

session_streams_y has same data like _id

LOAD CSV WITH HEADERS FROM 'file:///df_temp.csv' AS users
CREATE (p:Person {nodeId: users._id, sessionId: users.session_streams_y})
CREATE (sn:Session {sessId: users.sessionField, sessionId: users.session_streams_y})

MATCH (p:Person) 
with p as ppl
MATCH (sn:Session) 
WITH ppl, sn as ss
WHERE ppl.sessionId=ss.sessionId
MERGE (ppl)-[:Sessions {sess: 'Has Sessions'}]-(ss)
WITH [ppl,ss] as ns
CALL apoc.refactor.mergeNodes(ns) YIELD node
RETURN node

This gives something different

cybersam · Accepted Answer

Something like this may work for you:

LOAD CSV WITH HEADERS FROM 'file:///df_temp.csv' AS row
MERGE (p:Person {id: row._id})
MERGE (s:Session {id: row.sessionField})
FOREACH(
  x IN CASE WHEN s.eventTypes IS NULL OR NOT row.eventType IN s.eventTypes THEN [1] END |
  SET s.eventTypes = COALESCE(s.eventTypes, []) + row.eventType)
MERGE (p)-[:HAS_SESSION]->(s)
RETURN p, s

The resulting Person and Session nodes would be unique, each Session node would have an eventTypes list with distinct values, and the appropriate Person and Session nodes would be connected by a HAS_SESSION relationship.

An Action node does not seem to be necessary.

Neo4j Cypher: Unable to define relationships with duplicate records

Answers (1)

Related Questions