Francesco Binucci
Francesco Binucci

Reputation: 71

How to convert a node to multiple relationships in Neo4j

I have a Neo4j database that stores some information about scientific articles and their authors. I need to refactor my database in the following way, as shown in the image below. Each yellow node represents an author. Red nodes are articles. I need to remove red nodes and connect authors if they are co-authors of the same article.

Upvotes: 2

Views: 164

Answers (3)

cybersam
cybersam

Reputation: 66975

Although this query looks long, it should do what you want efficiently:

MATCH (author:Author)-[r:HAS_WRITTEN]->(article:Article)
DELETE r
WITH article, COLLECT(author) AS authors
DELETE article
WITH authors
UNWIND RANGE(0, SIZE(authors)-2) AS i
WITH authors, i
UNWIND RANGE(i+1, SIZE(authors)-1) AS j
WITH authors[i] AS a1, authors[j] AS a2
CREATE (a1)-[:CO_AUTHOR]->(a2)

It first finds all the HAS_WRITTEN relationships and deletes them. Then it aggregates the authors for each article and deletes the article. And then it creates a CO_AUTHOR relationship between every pair of co-authors (there is no need to check for existing relationships, assuming none of the relationships existed before running the query).

[CORRECTION]

The above query has a flaw, as pointed out in the comments. If the same pair of authors co-author multiple articles, then they would end up being connected by multiple CO-AUTHOR relationships. So, this would be a corrected solution:

MATCH (author:Author)-[r:HAS_WRITTEN]->(article:Article)
DELETE r
WITH article, COLLECT(author) AS authors
DELETE article
WITH authors
UNWIND RANGE(0, SIZE(authors)-2) AS i
WITH authors, i
UNWIND RANGE(i+1, SIZE(authors)-1) AS j
WITH authors[i] AS a1, authors[j] AS a2
MERGE (a1)-[:CO_AUTHOR]-(a2)

It first finds all the HAS_WRITTEN relationships and deletes them. Then it aggregates the authors for each article and deletes the article. And then it uses MERGE with an undirected relationship to ensure there is a single CO_AUTHOR relationship between every pair of co-authors. The UNWIND clauses are used to avoid obvious relationship duplications.

Upvotes: 2

Mafor
Mafor

Reputation: 10671

TL;TR:

MATCH (a1:Author)-[:HAS_WRITTEN]-(:Article)-[:HAS_WRITTEN]-(a2:Author) MERGE (a1)-[:CO_AUTHOR]-(a2)
  1. Fetch all author pairs connected through at least one article: MATCH (a1:Author)-[:HAS_WRITTEN]-(:Article)-[:HAS_WRITTEN]-(a2:Author)
  2. Add CO_AUTHOR relations. You could use CREATE clause, but you would end up with duplicated relations. MERGE will ensure each relation is added only once: MERGE (a1)-[:CO_AUTHOR]-(a2). See MERGE

Bare in mind that directions of the MERGE relations will be pretty arbitrary and should be ignored in queries. For example, to get all co-authors of the author "author1":

MATCH (a1:Author {name: "author1"})-[:CO_AUTHOR]-(a2) RETURN a2

Upvotes: 2

Graphileon
Graphileon

Reputation: 5385

// get articles with authors
// compare ids to remove duplicates
MATCH (auth1:Author)-[:HAS_WRITTEN]->(article:Article)<-[:HAS_WRITTEN]-(auth2:Author)
WHERE id(auth1) > id(auth2)

// delete the article node
DETACH DELETE article

// MERGE co_author relationship
MERGE (auth1)-[:CO_AUTHOR]->(auth2)

Upvotes: 2

Related Questions