Reputation: 3692

How to add multiple values to existing nodes with Cypher in Neo4J

I'm trying to load some data in Neo4J. I have a Person node which is already setup. Now, this node needs to have an email property which should be an array(or collection). Basically, the email property needs to have multiple values, like -

email: ["[email protected]", "[email protected]"]

I've come across similar questions here but all of the answers indicate to setting multiple property values at the time the node itself is created. Like this query from this answer -

CREATE (e:Employee { name:"Sam",languages: ["C", "C#"]})
RETURN e

But the problem in my case is that Person node is already created, and I need to set the email property on it now.

This is a small subset of the data I have to load -

 Personid|email 
933|[email protected] 
933|[email protected]
933|[email protected] 
1129|[email protected]
1129|[email protected] 
1129|[email protected]
4194|[email protected] 
4194|[email protected]

Also, the data is coming from a CSV file with thousands of rows, so my query needs to be generic, I can't set the properties for each individual Person node.

When I was testing out the creation of the email property with this subset, my first attempt was this -

 MATCH (n:TESTPERSON{id:933})
 SET n.email = "[email protected]"
 RETURN n

 MATCH (n:TESTPERSON{id:933})
 SET n.email = "[email protected]"
 RETURN n

As I was thinking, this just overwrites the email property to the value in the most recent query.

After looking at the answers here and on the Cypher docs, I found out that Neo4J allows you to set an array/collection (multiple values of the same type) as a property value, and then I tried this -

 // CREATE test node
 CREATE (n:TESTPERSON{id:933})
 RETURN n

 // at this time, this node does not have any `email` property, so setup 
 // email as an array with one string value
 MATCH (n:TESTPERSON{id:933})
 SET n.email = ["[email protected]"]
 RETURN n


 // Now, using +=, I can append to the array of strings
 MATCH (n:TESTPERSON{id:933})
 SET n.email = n.email + "[email protected]"
 RETURN n

 // add a third value to array
 MATCH (n:TESTPERSON{id:933})
 SET n.email = n.email + "[email protected]"
 RETURN n

Here's the result -

As you can see, the email property now has multiple values.

But the problem is that since my CSV file has thousands of rows, I need a generic query to do this.

I thought of using a CASE statement as per the documentation here, and tried this -

MATCH (n:TESTPERSON {id:933}) 
CASE 
WHEN n.email IS NULL THEN SET n.email = [ "[email protected]"] 
ELSE SET n.email = n.email + "[email protected]" 
RETURN n

But this just throws the error - mismatched input CASE expecting ;.

I was hoping I could use this query as a generic way for my CSV file like this -

LOAD CSV WITH HEADERS FROM 'FILEURL' AS line FIELDTERMINATOR `|`
MATCH (n:TESTPERSON {id:toInt(line.Personid)}) 
CASE 
WHEN n.email IS NULL THEN SET n.email = [line.email] 
ELSE SET n.email = n.email + line.email

But I don't even know if this would work, even if the CASE error is fixed.

I'm really stumped, and would appreciate any help. Thank You.

Upvotes: 3

Answers (3)

InverseFalcon

Reputation: 30417

You can use COALESCE() to use a default value in case the value you're trying to get is null. You might use it like this:

... SET n.email = COALESCE(n.email, []) + "[email protected]" ...

Whenever you're setting an array of values as a node property, it's a good idea to consider whether you might instead model these as separate nodes with relationships to the original node.

In this case, :Email nodes with some relationship to your :TESTPERSON nodes, with one :Email node per email, and multiple relationships from :TESTPERSON to multiple :Emails.

An advantage here is you'd be able to support uniqueness constraints, if you want to ensure there's only one :Email in the system, and you would be able to quickly look up a person by their email if you have an index or unique constraint, as the query would use the index to lookup the :Email and from there it's only one relationship traversal to the owner of the email.

When you have values in a collection on a node, you can't use an index lookup to a value in the collection, so your current model won't be able to quickly lookup a person by their email.

Upvotes: 5

Jerome_B

Reputation: 1097

A quick workaraound is to load your data in two steps

1/ LOAD CSV, create node with empty array property

2/LOAD CSV again, set emails +=

3/ Optionnal, depending on your data for each node, remove doubles in the array (do it with a custom procedure).

Should do it. I also am not very happy with the CASE syntax

Upvotes: 0

Fabio Lamanna

Reputation: 21584

Try this solution using MERGE:

LOAD CSV WITH HEADERS FROM 'file:///p.csv' AS line FIELDTERMINATOR '|'
MERGE (p:Person {id:toInteger(line.Personid)})
ON CREATE SET p.mail = line.email
ON MATCH SET p.mail = p.mail + '-' + line.email

The MERGE command take care of the duplicate nodes, and then we're setting the properties only when the node is created with ON CREATE SET, and when the node is already in the database (i.e. ON MATCH SET), we're going to add the email address to the property.

Hope that helps.

This is my result in Neo4j:

Upvotes: 0

How to add multiple values to existing nodes with Cypher in Neo4J

Answers (3)

Related Questions