Jome
Jome

Reputation: 1335

Best way to batch insert using cypher in java code

I'm not sure if this has been answered already but here goes. I have a Neoj DB already populated with lets say 100k nodes labelled as Person. I want to import activities that these persons have created and label them Activity. I have a csv of about 10 million activities which I would like to import into Neo4j.

The code below is what I do to create Cypher statements that can look up a user that is associated with an activity, create the activity node and establish a relationship between the user and the activity.

The method to handle this is below

public void addActivityToGraph(List<String> activities) {

    Map<String, Object> params = new HashMap<>();

    for (String r : activities) {
        String[] rd = r.split(";");
        log.info("Row count: " + (rowCount + 1) + "| " + r);
        log.info("Row count: " + (rowCount + 1)
                + "| Array Length: " + rd.length);

        Map<String, Object> props = new HashMap<>();

        props.put("activityid", Long.parseLong(rd[0]));
        props.put("objecttype", Integer.parseInt(rd[1]));
        props.put("objectid", Integer.parseInt(rd[2]));
        props.put("containertype", Integer.parseInt(rd[3]));
        props.put("containerid", Integer.parseInt(rd[4]));
        props.put("activitytype", Integer.parseInt(rd[5]));
        props.put("creationdate", Long.parseLong(rd[7]));

        params.put("props", props);
        params.put("userid", Integer.parseInt(rd[6]));

        try (Transaction tx = gd.beginTx()) {
           //engine is RestCypherQueryEngine
            engine.query("match (p:Person{userid:{userid}}) create unique (p)-[:created]->(a:Activity{props})", params);

            params.clear();
            tx.success();

        }
    }

}

While this works, I'm sure I am not using the right mix of tools as this process takes a whole day to finish. There has to be an easier way. I see a lot of documentation with Batch Rest API but I've not seen any with the case I have here (find an already existing user, create a relationship between the user and a new activity) I appreciate all the help i can get here.

Thanks.

Upvotes: 1

Views: 1641

Answers (2)

Michael Hunger
Michael Hunger

Reputation: 41676

There are no transactions with the rest-query-engine over the wire. You could use batching, but I think it is more sensible to use something like my neo4j-shell-tools to load your csv file

Install them as outlined here, then use

import-cypher -i activities.csv MATCH (p:Person{userid:{userid}}) CREATE (p)-[:created]->(a:Activity{activityid:{activityid}, ....})

Make sure to have indexes/constraints for your :Person(userid) and :Activity(activityid) to make matching and merging fast.

Upvotes: 0

FrobberOfBits
FrobberOfBits

Reputation: 18002

There are many ways to do batch import into Neo4j.

If you're using the 2.1 milestone release, there's a load CSV option in cypher.

If you actually already have structured CSV, I'd suggest not writing a bunch of java code to do it. Explore the available tools, and go from there.

Using the new cypher option, it might look something like this. The cypher query can be run in the neo4j shell, or via java if you wanted.

LOAD CSV WITH HEADERS FROM "file:///tmp/myPeople.csv" AS csvLine
MERGE (p:Person { userid: csvLine.userid})
MERGE (a:Activity { someProperty: csvLine.someProperty })
create unique (p)-[:created]->(a)

Upvotes: 1

Related Questions