Tim Holdsworth
Tim Holdsworth

Reputation: 499

Adding multiple relationships using WITH, WHERE, and UNWIND

I have data in the following structure:

{"id": "1", "name": "A. I. Lazarev", "org": "United States Department of State", "tags": [{"t": "Infrared"}, {"t": "Near-infrared spectroscopy"}, {"t": "Infrared astronomy"}, {"t": "Data collection"}], "pubs": [{"i": "1542417502", "r": 6}], }
{"id": "2", "name": "Stevan Spremo", "tags": [{"t": "Micro-g environment"}, {"t": "Antibiotics"}, {"t": "Bacteriology"}], "pubs": [{"i": "222163962", "r": 0}], }
{"id": "3", "name": "Bricchi G", "pubs": [{"i": "2417067698", "r": 1}, {"i": "2406980973", "r": 1}]}

Some of the rows have tags, some have organizations, some have both, and some have neither.

I'd like to add relationships between (1) authors and tags, (2) authors and organizations, and (3) authors and publications. I have the publications as nodes already, so it should be fairly straightforward to get (3) once I get (1) and (2).

I have been trying to use the following code:

CALL apoc.periodic.iterate(
"CALL apoc.load.json('file:/test.txt') YIELD value AS q RETURN q",
"UNWIND q.id as id
CREATE (a:Author {id:id, name:q.name, citations:q.n_citation, publications:q.n_pubs})
WITH q, a
UNWIND q.tags as tags
MERGE (t:Tag {{name: tags.t}})
CREATE (a)-[:HAS_TAGS]->(t)
WITH q, a
WHERE q.org is not null
MERGE (o:Organization {name: q.org})
CREATE (a)-[:AFFILIATED_WITH]->(o)",
{batchSize:10000, iterateList:true, parallel:false})

The tags and the organizations show up multiple times in the data, but should only have one node each, so I have used MERGE to create unique nodes for these.

The problem with the following code is that it creates duplicate AFFILIATED_WITH relationships - it actually creates the same number of AFFILIATED_WITH relationships as there are tags.

How can I change the cypher query so that it isn't creating duplicate relationships?

Upvotes: 0

Views: 178

Answers (1)

cybersam
cybersam

Reputation: 67019

After this clause:

UNWIND q.tags as tags

your query will have as many data rows as the number of tags for the current q (each row will have q, a, id, tags values). The subsequent operations will be performed once per data row. That is why you are creating too many AFFILIATED_WITH relationships.

To solve your issue, you have to reduce the number of data rows appropriately, at the appropriate time (and this will also speed up your processing, since unnecessarily repeated operations will be avoided). In your case, you can just change the second WITH q, a clause to WITH DISTINCT q, a:

CALL apoc.periodic.iterate(
  "CALL apoc.load.json('file:///test.txt') YIELD value AS q RETURN q",
  "CREATE (a:Author {id:q.id, name:q.name, citations:q.n_citation, publications:q.n_pubs})
   WITH q, a
   UNWIND q.tags as tags
   MERGE (t:Tag {name: tags.t})
   CREATE (a)-[:HAS_TAGS]->(t)
   WITH DISTINCT q, a
   WHERE q.org is not null
   MERGE (o:Organization {name: q.org})
   CREATE (a)-[:AFFILIATED_WITH]->(o)",
  {batchSize:10000, iterateList:true, parallel:false}
)

I have also simplified the query by removing the unnecessary UNWIND q.id as id clause, and fixed some syntax issues.

[UPDATED]

If you want to add the AUTHORED relationships (as requested in the comments to this answer), you should do that before you create the AFFILIATED_WITH relationships -- since the WHERE q.org is not null clause would filter out some q nodes. Also, whenever you use CREATE to create a relationship, Cypher requires that you specify a direction for the relationship.

CALL apoc.periodic.iterate(
  "CALL apoc.load.json('file:///test.txt') YIELD value AS q RETURN q",
  "CREATE (a:Author {id:q.id, name:q.name, citations:q.n_citation, publications:q.n_pubs})
   WITH q, a
   UNWIND q.tags as tags
   MERGE (t:Tag {name: tags.t})
   CREATE (a)-[:HAS_TAGS]->(t)
   WITH DISTINCT q, a
   UNWIND q.pubs as pubs
   MERGE (p:Quanta {id: pubs.i})
   CREATE (a)-[r:AUTHORED {rank: pubs.r}]->(p)
   WITH q, a
   WHERE q.org is not null
   MERGE (o:Organization {name: q.org})
   CREATE (a)-[:AFFILIATED_WITH]->(o)",
  {batchSize:10000, iterateList:true, parallel:false}
)

Upvotes: 3

Related Questions