rahul1205
rahul1205

Reputation: 884

Building a relationship in Neo4j using Neo4j Spark Connector

I am trying to build a simple relationship in Neo4j using Spark-Neo4j connector. My dataframe looks like this:

df_new= spark.createDataFrame(
    [("CompanyA",'A','CompanyA','B'),("CompanyB",'B','CompanyB','C') ],
    ["name",'gid','description','parent_gid']
)

The desired tree should look like this: enter image description here

The query I wrote looks like this:

query = """
MERGE (c:Company {gid:event.gid})
ON CREATE SET c.name=event.name, c.description=event.description 
ON MATCH SET c.name=event.name, c.description=event.description
MERGE (p:Company {gid:event.parent_gid}) 
MERGE (p)-[:PARENT_OF]->(c)
"""

df_new.write\
    .mode("Overwrite")\
    .format("org.neo4j.spark.DataSource")\
    .option("url", "bolt://localhost:7687")\
    .option("authentication.type", "basic")\
    .option("authentication.basic.username", username)\
    .option("authentication.basic.password", password)\
    .option("query", query)\
    .save()

However my code ends up creating node instead of merging it, and I end up with two nodes for company B

enter image description here

Upvotes: 1

Views: 801

Answers (1)

Vincent Rupp
Vincent Rupp

Reputation: 655

You have the exact right logic, there's just some nuance at play that is hard to pin down. This article has your answer; read the section near the end about unique constraints: https://neo4j.com/developer/kb/understanding-how-merge-works/

One solution is to change your query to this:

query = '''
  merge (c:Company {gid:event.gid})
  set c.name = event.name, c.description = event.description
  merge (p:Company {gid:event.parent_gid})
  set p.name = event.name, p.description = event.description
  merge (p)-[:PARENT_OF]->(c)
'''

Now when performing concurrent operations, cypher has enough unique constraints to avoid duplicating gid = "B"

Upvotes: 0

Related Questions