Merging in Neo4j when uniqueness is based on both the node property and presence of a relationship

Question

In my model I have 2 types of nodes; Systems and Datasets. Every Dataset belongs belong to a System. This is represented by a CONTAINS_DATASET relationship.

As a general rule; a Dataset name must be unique within a given system. I allow duplicate Dataset names if the Datasets are contained in different systems.

I am trying to enforce this via Cypher such that; when someone tries to create a Dataset that is CONTAINED_IN a System; the Cypher will only create a new Dataset if the Name of the Dataset does not match an existing Dataset that is CONTAINED_IN that System.

I think I need a MERGE Statement that includes a filter criteria based on the relationship to the System has been specified but I don't know how to do that. I have included the code I am using below but its only a merge that does not consider what System the Dataset resides in.

:params
  "data": {
    "System": [
      {
        "name": "System 1",
        "datasets": [
          {
            "name": "Customers"
          }
        ]
      },
      {
        "name": "System 2",
        "datasets": [
          {
            "name": "Customers"
          }
        ]
      },
      {
        "name": "System 3",
        "datasets": [
          {
            "name": "Products"
          }
        ]
      }
    ]
  }

UNWIND {data} as data
UNWIND data.System as systems
UNWIND systems.datasets as datasets
MERGE (sy:System { name: systems.name})
    ON CREATE SET sy.status='New'
    ON MATCH SET  sy.status='Updated'
MERGE (da:Dataset { name: datasets.name})
MERGE (sy)-[:CONTAINS_DATASET]->(dan:Dataset { name: datasets.name })
return *

The above query is also creating 2 additional nodes that I'm not expecting so any help with that would be appreciated as well:

InverseFalcon · Accepted Answer

You're very close here, your Cypher actually has the solution you need:

MERGE (sy)-[:CONTAINS_DATASET]->(dan:Dataset { name: datasets.name })

This pattern has a bound (previously-matched to a graph element) variable sy, and an unbound (not previously matched to anything, first occurrence in the query) variable dan.

MERGE is like doing a MATCH of the pattern, and then the behavior will change based on whether that pattern exists in the graph or not.

If it exists in the graph (sy has a :CONTAINS_DATASET relationship to a :Dataset node with the given name) then it will reuse the existing graph structures, and dan will be bound to that existing connected node.

If it does not exist in the graph then the entire pattern will be created, and this will include the creation of any not-previously-bound nodes, like dan. If the pattern doesn't exist, it will lock on the bound parts of the pattern (sy), do a double-check to make sure nothing was changed between the time it checked, and the time it took locks, and then it will create the parts of the pattern that were not previously bound. sy was previously bound, so it will use that same node instead of creating a new one. (dan:Dataset { name: datasets.name }) was not previously bound, so a new node with this label and this property will be created and connected to sy via a :CONTAINS_DATASET relationship.

So this behavior should be exactly what you need, reusing a connected node with that name if it exists, or creating a brand new node with this name and connecting it per sy node.

As for the duplicates you're seeing, this is because of the line just before that one:

MERGE (da:Dataset { name: datasets.name})

This isn't needed here, you're already fulfilling your needs with the line after it, so remove it and your query should work for you.

More details about MERGE behavior in our knowledge base article.

Merging in Neo4j when uniqueness is based on both the node property and presence of a relationship

Answers (1)

Related Questions