Reputation: 61
Please see the image of the csv file. I am working with Cypher, Neo4j. As you can see the activities with their timestamp all belong to a case_id respectively. Many belong to the same case_id (here you see case_id 3, 2, 1), but please imagine there are many many more. I want to group the activities that belong to the same case id and perform the same! query on each of the groups (the grouping is essential).
Is there a way to do that other than rewriting the same query for each group, as done here in three steps?:
1.
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///XY" AS row
WITH toInteger(row.case_id) AS cid, row
WHERE cid=3
CREATE (act:Activity {caseId: cid, activityName: row.activity, time: row.timestamp})
'QUERY'
2.
LOAD CSV WITH HEADERS FROM "file:///XY" AS row
WITH toInteger(row.case_id) AS cid, row
WHERE cid=2
CREATE (act:Activity {caseId: cid, activityName: row.activity, time: row.timestamp})
'QUERY'
3.
LOAD CSV WITH HEADERS FROM "file:///XY" AS row
WITH toInteger(row.case_id) AS cid, row
WHERE cid=1
CREATE (act:Activity {caseId: cid, activityName: row.activity, time:
row.timestamp})
'QUERY'
So basically I want to generalize WHERE cid=3(or 2 or 1)
in the sense of iterating over all different case-ids without explicitly naming them. A bit like in Java for each element in array (array content: group by case_id) do QUERY
.
Any idea how?
Thank you in advance and I will be happy to provide a better description if this sounds too cryptic.
Update: Here is the query:
MATCH(act: Activity)
WHERE act.caseId = 1 //and here I want to be able to simplify for EVERY caseId
WITH act ORDER BY act.time ASC
WITH apoc.coll.frequencies(apoc.coll.pairsMin(COLLECT(act.activityName))) AS g
UNWIND g AS p
RETURN*
Upvotes: 0
Views: 311
Reputation: 30417
It seems to me that a single LOAD CSV query ought to handle this, just set the caseId to the integer value of row.case_id:
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM "file:///XY" AS row
WITH toInteger(row.case_id) AS cid, row
CREATE (act:Activity {caseId: cid, activityName: row.activity, time: row.timestamp})
Okay, I see you want to execute some query with each group. Can you explain why executing the query after the create in the load csv won't work?
Would performing the query post-import work for you?
Some more information on the query you intend to run would be helpful.
Upvotes: 1