Creating nodes and relations from JSON (dynamically)

Question

I've got a couple hundred JSONs in a structure like the following example:

{
"JsonExport": [
    {
        "entities": [
            {
                "identity": "ENTITY_001",
                "surname": "SMIT",
                "entityLocationRelation": [
                    {
                        "parentIdentification": "PARENT_ENTITY_001",
                        "typeRelation": "SEEN_AT",
                        "locationIdentity": "LOCATION_001"
                    },
                    {
                        "parentIdentification": "PARENT_ENTITY_001",
                        "typeRelation": "SEEN_AT",
                        "locationIdentity": "LOCATION_002"
                    }
                ],
                "entityEntityRelation": [
                    {
                        "parentIdentification": "PARENT_ENTITY_001",
                        "typeRelation": "FRIENDS_WITH",
                        "childIdentification": "ENTITY_002"
                    }
                ]
            },
            {
                "identity": "ENTITY_002",
                "surname": "JACKSON",
                "entityLocationRelation": [
                    {
                        "parentIdentification": "PARENT_ENTITY_002",
                        "typeRelation": "SEEN_AT",
                        "locationIdentity": "LOCATION_001"
                    }
                ]
            },
            {
                "identity": "ENTITY_003",
                "surname": "JOHNSON"
            }
        ],
        "identification": "REGISTRATION_001",
        "locations": [
            {
                "city": "LONDON",
                "identity": "LOCATION_001"
            },
            {
                "city": "PARIS",
                "identity": "LOCATION_002"
            }
        ]
    }
]
}

With these JSON's, I want to make a graph consisting of the following nodes: Registration, Entity and Location. This part I've figured out and made the following:

WITH "file:///example.json" AS json_file
CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
MERGE(r:Registration {id:data.identification})
WITH json_file
CALL apoc.load.json(json_file,"$.JsonExport..locations.*" ) YIELD value AS locations
MERGE(l:Locations{identity:locations.identity, name:locations.city})
WITH json_file
CALL apoc.load.json(json_file,"$.JsonExport..entities.*" ) YIELD value AS entities
MERGE(e:Entities {name:entities.surname, identity:entities.identity})

All the entities and locations should have a relation with the registration. I thought I could do this by using the following code:

MERGE (e)-[:REGISTERED_ON]->(r)
MERGE (l)-[:REGISTERED_ON]->(r)

However this code doesn’t give the desired output. It creates extra "empty" nodes and doesn't connect to the registration node. So the first question is: How do I connect the location and entities nodes to the registration node. And in light of the other JSON's, the entities and locations should only be linked to the specific registration.

Furthermore, I would like to make the entity -> location relation and the entity - entity relation and use the given type of relation (SEEN_AT or FRIENDS_WITH) as label for the given relation. How can this be done? I'm kind of lost at this point and don’t see how to solve this. If someone could guide me into the right direction I would be much obliged.

cybersam · Accepted Answer

Variable names (like e and r) are not stored in the DB, and are bound to values only within individual queries. MERGE on a pattern with an unbound variable will just create the entire pattern (including creating an empty node for unbound node variables).
When you MERGE a node, you should only specify the unique identifying property for that node, to avoid duplicates. Any other properties you want to set at the time of creation should be set using ON CREATE SET.
It is inefficient to parse through the JSON data 3 times to get different areas of the data. And it is especially inefficient the way your query was doing it, since each subsequent CALL/MERGE group of clauses would be done multiple times (since every previous CALL produces multiple rows, and the number of rows increases multiplicative). You can use aggregation to get around that, but it is unnecessary in your case, since you can just do the entire query in a single pass through the JSON data.

This may work for you:

CALL apoc.load.json(json_file,"$.JsonExport.*" ) YIELD value AS data
MERGE(r:Registration {id:data.identification})
FOREACH(ent IN data.entities |
  MERGE (e:Entities {identity: ent.identity})
  ON CREATE SET e.name = ent.surname
  MERGE (e)-[:REGISTERED_ON]->(r)
  FOREACH(loc1 IN ent.entityLocationRelation |
    MERGE (l1:Locations {identity: loc1.locationIdentity})
    MERGE (e)-[:SEEN_AT]->(l1))
  FOREACH(ent2 IN ent.entityEntityRelation |
    MERGE (e2:Entities {identity: ent2.childIdentification})
    MERGE (e)-[:FRIENDS_WITH]->(e2))
)
FOREACH(loc IN data.locations |
  MERGE (l:Locations{identity:loc.identity})
  ON CREATE SET l.name = loc.city
  MERGE (l)-[:REGISTERED_ON]->(r)
)

For simplicity, it hard-codes the FRIENDS_WITH and REGISTERED_ON relationship types, as MERGE only supports hard-coded relationship types.

Creating nodes and relations from JSON (dynamically)

Answers (2)

Related Questions