jos97
jos97

Reputation: 395

Neo4j: how to avoid node to be created again if it is already in the database?

I have a question about Cypher requests and the update of a database. I have a python script that does web scraping and generate a csv at the end. I use this csv to import data in a neo4j database.

The scraping is done 5 times a day. So every time a new scraping is done the csv is updated, new data is added to the the previous csv and so on. I import the data after each scraping. Actually when I import the data after each scraping to update the DB, I have all the nodes created again even if it is already in the DB.

For example the first csv gives 5 rows and I insert this in Neo4j. Next the new scraping gives 2 rows so the csv has now 7 rows. And if I insert the data I will have the first five rows twice in the DB. I would like to have everything unique and not added if it is already in the database.

For example when I try to create node ARTICLE I do this:

CREATE (a:ARTICLE {id:$id, title:$title, img_url:$img_url, link:$link, sentence:$sentence, published:$published})

I think MERGE instead of CREATE should solve the solution, but it doesn't and I can't figure it out why.

How can I do this ?

Upvotes: 1

Views: 760

Answers (2)

cybersam
cybersam

Reputation: 66999

A MERGE clause will create its entire pattern if any part of it does not already exist. So, for a MERGE clause to work reasonably, the pattern used with it must only specify the minimum data necessary to uniquely identify a node (or a relationship).

For instance, assuming ARTICLE nodes are supposed to have unique id properties, then you should replace your CREATE clause:

CREATE (a:ARTICLE {id:$id, title:$title, img_url:$img_url, link:$link, sentence:$sentence, published:$published})

with something like this:

MERGE (a:ARTICLE {id:$id})
SET a += {title:$title, img_url:$img_url, link:$link, sentence:$sentence, published:$published}

In the above example, the SET clause will always overwrite the non-id properties. If you want to set those properties only when the node is created, you can use ON CREATE before the SET clause.

Upvotes: 2

Lukasmp3
Lukasmp3

Reputation: 148

Use MERGE instead of CREATE. You can use it for both nodes and relationships.

MERGE (charlie { name: 'Charlie Sheen', age: 10 })

Create a single node with properties where not all properties match any existing node.

MATCH (a:Person {name: "Martin"}),
      (b:Person {name: "Marie"})
MERGE (a)-[r:LOVES]->(b)

Finds or creates a relationship between the nodes.

Upvotes: 0

Related Questions