user3337008
user3337008

Reputation: 1

How to combine similar nodes in neo4j

I have defined few nodes and relationships in neo4j graph database but the output is bit different from expected one as each node is representing its own data and attributes. I want combination of same node showcasing different relationships and attributes

`LOAD CSV WITH HEADERS FROM "file:///data.csv" AS line 
CREATE(s:SourceID{Name:line.SourceID})
CREATE(t:Title{Name:line.Title})
CREATE(c:Coverage{Name:line.Coverage})
CREATE(p:Publisher{Name:line.Publisher})
MERGE (p)-[:PUBLISHES]->(t) 
MERGE (p)-[:Coverage{covers:line.Coverage}]->(t)
MERGE (t)-[:BelongsTO]->(p)
MERGE (s)-[:SourceID]->(t)`

enter image description here

In given picture there are two nodes with Springer Nature and i wish to have only one node namely, Springer Nature and all the associated data of both the nodes to be present in single node.

Upvotes: 0

Views: 192

Answers (1)

NanisTe
NanisTe

Reputation: 13

First of all, I would recommend you to set a CONSTRAINT before adding data. It seems that the Nodes can have duplicates when creating them because you are merging patterns and the cypher query does not specify that the nodes have to be identified unique nodes.

So in your case try this first for each of the node labels:

CREATE CONSTRAINT publisherID IF NOT EXISTS FOR (n:Publisher) REQUIRE (n.Name) IS UNIQUE;
CREATE CONSTRAINT sourceID IF NOT EXISTS FOR (n:SourceID) REQUIRE (n.Name) IS UNIQUE;
CREATE CONSTRAINT titleID IF NOT EXISTS FOR (n:Title) REQUIRE (n.Name) IS UNIQUE;
CREATE CONSTRAINT coverageID IF NOT EXISTS FOR (n:Coverage) REQUIRE (n.Name) IS UNIQUE;

Even better would be to not use the name but a publisher ID. But this is your choice, and if there aren't thousands of publishers in the data, this will be no issue at all.

Also, I would not use CREATE for creating the nodes but use MERGE instead. Because the cypher query goes line-by-line, if you want to create a node which already exists—which could happen on the second line or on the fiftieth line—the query would fail if you set the CONSTRAINT above.

And try everything on a blank database; for example, by deleting all nodes:

MATCH (n) DETACH DELETE n

So to sum up the Cypher Query in one go, you send the queries separately:

CREATE CONSTRAINT publisherID IF NOT EXISTS FOR (n:Publisher) REQUIRE (n.Name) IS UNIQUE;
CREATE CONSTRAINT sourceID IF NOT EXISTS FOR (n:SourceID) REQUIRE (n.Name) IS UNIQUE;
CREATE CONSTRAINT titleID IF NOT EXISTS FOR (n:Title) REQUIRE (n.Name) IS UNIQUE;
CREATE CONSTRAINT coverageID IF NOT EXISTS FOR (n:Coverage) REQUIRE (n.Name) IS UNIQUE;

LOAD CSV WITH HEADERS FROM "file:///data.csv" AS line 
MERGE(s:SourceID{Name:line.SourceID})
MERGE(t:Title{Name:line.Title})
MERGE(c:Coverage{Name:line.Coverage})
MERGE(p:Publisher{Name:line.Publisher})
MERGE (p)-[:PUBLISHES]->(t) 
MERGE (p)-[:Coverage{covers:line.Coverage}]->(t)
MERGE (t)-[:BelongsTO]->(p)
MERGE (s)-[:SourceID]->(t)
RETURN count(p), count(t), count(c), count(s);

Upvotes: 1

Related Questions