Porjaz
Porjaz

Reputation: 791

Neo4j update nodes from CSV file

I need to update the already existing nodes with new data and create a new one if the node doesn't exist. I am trying to do that with the following query:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///C:/Users/Zona5/Documents/Neo4j/check/import/cd1_5.csv' AS     line1
MERGE (c:Company {companyNumber:line1.companyNumber})
WITH c, line1
MERGE (ca:CompanyAddress)
ON CREATE SET ca.county=line1.County
ON MATCH SET ca.county=line1.County
ON CREATE SET ca.country=line1.Country
ON MATCH SET ca.country=line1.Country
ON CREATE SET ca.postCode=line1.postCode 
ON MATCH SET ca.postCode=line1.postCode
ON CREATE SET ca.poBox=line1.POBox
ON MATCH SET ca.poBox=line1.POBox
ON CREATE SET ca.careOf=line1.CareOf
ON MATCH SET ca.careOf=line1.CareOf
ON CREATE SET ca.addressLine1=line1.AddressLine1
ON MATCH SET ca.addressLine1=line1.AddressLine1
ON CREATE SET ca.addressLine2=line1.AddressLine2
ON MATCH SET ca.addressLine2=line1.AddressLine2
MERGE (c)-[:HAS_COMPANY_ADDRESS]->(ca)

The query is running for a long time but the database size does not increase.

Am I doing something wrong here and is there another way to update the nodes?

Upvotes: 0

Views: 1142

Answers (1)

William Lyon
William Lyon

Reputation: 8556

The MERGE clause does exactly this. From the docs:

MERGE either matches existing nodes and binds them, or it creates new data and binds that. It’s like a combination of MATCH and CREATE that additionally allows you to specify what happens if the data was matched or created.

So let's look at your query. This line:

MERGE (c:Company {companyNumber:line1.companyNumber})

ill create a new node with the label Company if there is no existing node with the label Company that has the property companyNumber with a value of whatever line1.companyNumber is for this row in your CSV file. This is similar to the a primary key in a relational database. companyNumber is the thing that uniquely identifies a Company node and you don't want to create duplicate nodes with the same value for this property.

You should also create a uniqueness constraint to enforce this at the database schema level:

CREATE CONSTRAINT ON (c:Company) ASSERT c.companyNumber IS UNIQUE;

But otherwise this first part looks good.

Now the next part of your query:

MERGE (ca:CompanyAddress)
ON CREATE SET ca.county=line1.County
ON MATCH SET ca.county=line1.County
ON CREATE SET ca.country=line1.Country
ON MATCH SET ca.country=line1.Country
ON CREATE SET ca.postCode=line1.postCode 
ON MATCH SET ca.postCode=line1.postCode
ON CREATE SET ca.poBox=line1.POBox
ON MATCH SET ca.poBox=line1.POBox
ON CREATE SET ca.careOf=line1.CareOf
ON MATCH SET ca.careOf=line1.CareOf
ON CREATE SET ca.addressLine1=line1.AddressLine1
ON MATCH SET ca.addressLine1=line1.AddressLine1
ON CREATE SET ca.addressLine2=line1.AddressLine2
ON MATCH SET ca.addressLine2=line1.AddressLine2

MERGE takes a pattern and will then search the graph to see if this pattern exists, creating data if it does not exist. The pattern you are specifying to MERGE on is a node with the label CompanyAddress. This will MATCH on all nodes with the label CompanyAddress regardless of their properties. Then the following SET statements will update the properties for all CompanyAddress nodes. So what you will end up with is that this part of the query will create at most one CompanyAddress node and all existingCompanyAddress nodes will have the same property values, whatever the last row in your CSV file is.

Instead you should MERGE using multiple properties:

MERGE (ca:CompanyAddress {
    county: line1.County, 
    country: line1.Country,
    poBox: line1.POBox,
    careof: line1.Careof,
    addressLine1: line1.AddressLine1,
    addressLine2: line1.AddressLine2
})
MERGE (c)-[:HAS_COMPANY_ADDRESS]->(ca)

Although note that if one field changes this approach will create a new node for the CompanyAddress. If you want to update a field on an address when it changes you'll need to introduce some unique id for the address and MERGE on that property.

UPDATE

Given the data modeling constraint that each Company node is connected to only one CompanyAddress node, this query:

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM
'file:///C:/Users/Zona5/Documents/Neo4j/check/import/cd1_5.csv' AS line1
MERGE (c:Company {companyNumber:line1.companyNumber})
MERGE (c)-[:HAS_COMPANY_ADDRESS]->(ca:CompanyAddress)
SET ca.county=line1.County,
    ca.country=line1.Country,
    ca.postCode=line1.postCode,
    ca.poBox=line1.POBox,
    ca.careOf=line1.CareOf,
    ca.addressLine1=line1.AddressLine1,
    ca.addressLine2=line1.AddressLine2

will create Company, CompanyAddress nodes connected by HAS_COMPANY_ADDRESS relationship if they do not exist, and then update the properties of CompanyAddress.

Upvotes: 1

Related Questions