Reputation: 3692
I'm trying to load some data in Neo4J. I have a Person
node which is already setup. Now, this node needs to have an email
property which should be an array(or collection). Basically, the email
property needs to have multiple values, like -
email: ["[email protected]", "[email protected]"]
I've come across similar questions here but all of the answers indicate to setting multiple property values at the time the node itself is created. Like this query from this answer -
CREATE (e:Employee { name:"Sam",languages: ["C", "C#"]})
RETURN e
But the problem in my case is that Person
node is already created, and I need to set the email
property on it now.
This is a small subset of the data I have to load -
Personid|email
933|[email protected]
933|[email protected]
933|[email protected]
1129|[email protected]
1129|[email protected]
1129|[email protected]
4194|[email protected]
4194|[email protected]
Also, the data is coming from a CSV file with thousands of rows, so my query needs to be generic, I can't set the properties for each individual Person
node.
When I was testing out the creation of the email property with this subset, my first attempt was this -
MATCH (n:TESTPERSON{id:933})
SET n.email = "[email protected]"
RETURN n
MATCH (n:TESTPERSON{id:933})
SET n.email = "[email protected]"
RETURN n
As I was thinking, this just overwrites the email
property to the value in the most recent query.
After looking at the answers here and on the Cypher docs, I found out that Neo4J allows you to set an array/collection (multiple values of the same type) as a property value, and then I tried this -
// CREATE test node
CREATE (n:TESTPERSON{id:933})
RETURN n
// at this time, this node does not have any `email` property, so setup
// email as an array with one string value
MATCH (n:TESTPERSON{id:933})
SET n.email = ["[email protected]"]
RETURN n
// Now, using +=, I can append to the array of strings
MATCH (n:TESTPERSON{id:933})
SET n.email = n.email + "[email protected]"
RETURN n
// add a third value to array
MATCH (n:TESTPERSON{id:933})
SET n.email = n.email + "[email protected]"
RETURN n
As you can see, the email
property now has multiple values.
But the problem is that since my CSV file has thousands of rows, I need a generic query to do this.
I thought of using a CASE
statement as per the documentation here, and tried this -
MATCH (n:TESTPERSON {id:933})
CASE
WHEN n.email IS NULL THEN SET n.email = [ "[email protected]"]
ELSE SET n.email = n.email + "[email protected]"
RETURN n
But this just throws the error - mismatched input CASE expecting ;
.
I was hoping I could use this query as a generic way for my CSV file like this -
LOAD CSV WITH HEADERS FROM 'FILEURL' AS line FIELDTERMINATOR `|`
MATCH (n:TESTPERSON {id:toInt(line.Personid)})
CASE
WHEN n.email IS NULL THEN SET n.email = [line.email]
ELSE SET n.email = n.email + line.email
But I don't even know if this would work, even if the CASE
error is fixed.
I'm really stumped, and would appreciate any help. Thank You.
Upvotes: 3
Views: 3024
Reputation: 30397
You can use COALESCE() to use a default value in case the value you're trying to get is null. You might use it like this:
...
SET n.email = COALESCE(n.email, []) + "[email protected]"
...
Whenever you're setting an array of values as a node property, it's a good idea to consider whether you might instead model these as separate nodes with relationships to the original node.
In this case, :Email nodes with some relationship to your :TESTPERSON nodes, with one :Email node per email, and multiple relationships from :TESTPERSON to multiple :Emails.
An advantage here is you'd be able to support uniqueness constraints, if you want to ensure there's only one :Email in the system, and you would be able to quickly look up a person by their email if you have an index or unique constraint, as the query would use the index to lookup the :Email and from there it's only one relationship traversal to the owner of the email.
When you have values in a collection on a node, you can't use an index lookup to a value in the collection, so your current model won't be able to quickly lookup a person by their email.
Upvotes: 5
Reputation: 1097
A quick workaraound is to load your data in two steps
1/ LOAD CSV, create node with empty array property
2/LOAD CSV again, set emails +=
3/ Optionnal, depending on your data for each node, remove doubles in the array (do it with a custom procedure).
Should do it. I also am not very happy with the CASE syntax
Upvotes: 0
Reputation: 21542
Try this solution using MERGE
:
LOAD CSV WITH HEADERS FROM 'file:///p.csv' AS line FIELDTERMINATOR '|'
MERGE (p:Person {id:toInteger(line.Personid)})
ON CREATE SET p.mail = line.email
ON MATCH SET p.mail = p.mail + '-' + line.email
The MERGE
command take care of the duplicate nodes, and then we're setting the properties only when the node is created with ON CREATE SET
, and when the node is already in the database (i.e. ON MATCH SET
), we're going to add the email address to the property.
Hope that helps.
Upvotes: 0