Wahaj Ahmad
Wahaj Ahmad

Reputation: 53

Can neo4j create the map automatically from the json file if the relationships are defined in the json file?

I have a json file that defines the nodes and their relationships. It looks sometihng like this:

{"p":{"type":"node","id":"0","labels":["Paintings"],"properties":{"date":"1659-01-01T00:00:00","img":"removed-for-brevity(RFB)","name":"King Caspar","sitelink":"1","description":"RFB","exhibit":"RAB","uri":"RFB"}},"r":{"id":"144","type":"relationship","label":"on_MATERIAL","start":{"id":"0","labels":["Paintings"]},"end":{"id":"2504","labels":["Material"]}},"n":{"type":"node","id":"2504","labels":["Material"],"properties":{"name":"oak","sitelink":5,"description":"RFB","uri":"RFB"}}}

"p" is the first node, "r" is the relationship, "n" is the second node.

Is it possible for neo4j to create a graph/map automatically from this json file, without having to define the nodes and relationships through cypher manually?

I am fairly new to neo4j, I tried following the examples given on the Load JSON page, but it defines the nodes and their relationships manually, which i want to avoid.

Upvotes: 0

Views: 319

Answers (2)

Wahaj Ahmad
Wahaj Ahmad

Reputation: 53

It looks like neo4j can't automatically create a graph data model using a json file (as @cybersam pointed out earlier).

I ended up writing a Python script to do this for me. Posting this here just in case it helps someone. It does the job for me!

from neo4j import GraphDatabase
import json

# Connect to Neo4j
uri = "bolt://localhost:7687"
username = "_username_"
password = "_password_"

driver = GraphDatabase.driver(uri, auth=(username, password))

processed_painting_ids = set() #mainting a set to track unique painting node IDs
processed_node_ids = set()

# Load JSON data from file
with open("data_json.json", "r") as file:
    for line in file:
        json_data = json.loads(line)

        p_data = json_data["p"]
        r_data = json_data["r"]
        n_data = json_data["n"]

        p_unique_id = p_data.get("id") #keeps track of the id of the "p" node. 

        # Handle missing values in the data
        p_id = str(p_data["id"])
        p_date = str(p_data["properties"].get("date", "Unknown date"))
        p_img = p_data["properties"].get("img", "Unknown img")
        p_name = p_data["properties"].get("name", "Unknown name")
        p_sitelink = str(p_data["properties"].get("sitelink", "Unknown sitelink"))
        p_description = p_data["properties"].get("description", "Unknown description")
        p_exhibit = p_data["properties"].get("exhibit", "Unknown exhibit")
        p_uri = str(p_data["properties"].get("uri", "Unknown uri"))

        r_id = str(r_data["id"])
        r_label = r_data["label"]
        start_id = str(r_data["start"]["id"])
        end_id = str(r_data["end"]["id"])

        n_id = str(n_data["id"])
        n_name = n_data["properties"].get("name", "Unknown name")
        n_sitelink = str(n_data["properties"].get("sitelink","Unknown sitelink"))
        n_description = n_data["properties"].get("description","Unknown description")
        n_uri = n_data["properties"].get("uri","Unknown uri")

        with driver.session() as session:
    
            # Create the "n" material node
            if n_id not in processed_node_ids:
                session.run("CREATE (n:" + n_data["labels"][0] + " {id: " + n_id + ", name: \"" + n_name + "\", sitelink: \"" + n_sitelink + "\", description: \"" + n_description + "\", uri: \"" + uri + "\"})")
                processed_node_ids.add(n_id)
            # check if the "p" node is repititive
            if p_unique_id not in processed_painting_ids:
                # Create the "p" node
                session.run("CREATE (p:" + p_data["labels"][0] + "{id: "+p_id+",date: \""+p_date+"\", img: \""+p_img+"\", name: \""+p_name+"\", sitelink: " + p_sitelink+", description: \""+p_description+"\", exhibit: \""+p_exhibit+"\", uri: \""+p_uri + "\"})") 
                # Add id of the node to the set
                processed_painting_ids.add(p_unique_id)
            # Create the "r" relationship
            session.run("MATCH (start), (end) WHERE start.id = "+start_id+" AND end.id = "+end_id+" CREATE (start)-[r:"+r_label+" {id: "+r_id+"}]->(end)")

Upvotes: 1

cybersam
cybersam

Reputation: 66989

No, there is no automated way, and even if there were the generated result could be suboptimal or even wrong for your use cases.

You need to design the graph data model (node labels, relationship types, etc.) yourself. There are many considerations (like your use cases, and the necessary indexes and constraints) that are not revealed by a simple JSON data dump. Also, you need to understand the schema of the JSON and determine how to map that to your data model.

Upvotes: 1

Related Questions