Neo4j Insertion taking more time

Question

We are having about 50,000 nodes and 80,00,000 (80 lakhs) edges.

We are trying to insert this data into neo4j (embedded graph database) using java. But it's taking lot of time (hours together).

We want to know whether we are going wrong in insertion anywhere. we are using automatic indexes for nodes. The complete implementation is given below.

Please let me know what is going wrong and changes to do with the below code.

public static void main(String[] args)
{

        // TODO Auto-generated method stub
        nodeGraph obj = new nodeGraph();
        obj.createDB();
        System.out.println("Graph Database Initialised");
        obj.parseNodesCsv();
        System.out.println("Creating relationships in process....");
        obj.parseEdgesCsv();
        obj.shutDown();

}

public void createDB() {

        graphDb = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder( DB_PATH ).
        setConfig( GraphDatabaseSettings.node_keys_indexable, "id,name" ).
        setConfig( GraphDatabaseSettings.relationship_keys_indexable, "rel" ).
        setConfig( GraphDatabaseSettings.node_auto_indexing, "true" ).
        setConfig( GraphDatabaseSettings.relationship_auto_indexing, "true" ).
        newGraphDatabase();             
        registerShutdownHook(graphDb);
        // Get the Node AutoIndexer, set nodeProp1 and nodeProp2 as auto
        // indexed.
        AutoIndexer nodeAutoIndexer = graphDb.index().getNodeAutoIndexer();
        nodeAutoIndexer.startAutoIndexingProperty( "id" );
        nodeAutoIndexer.startAutoIndexingProperty( "name" );

        // Get the Relationship AutoIndexer
        //AutoIndexer relAutoIndexer = graphDb.index().getRelationshipAutoIndexer();
        //relAutoIndexer.startAutoIndexingProperty( "relProp1" );

        // None of the AutoIndexers are enabled so far. Do that now
        nodeAutoIndexer.setEnabled( true );
        //relAutoIndexer.setEnabled( true );
}

public void parseNodesCsv(){

        try 
        {
            CSVReader reader= new CSVReader(new FileReader("/home/sandy/Desktop/workspacesh/importToNeo4j/nodesNeo.csv"),'  ','"');
            String rows[]=null;
            while ((rows=reader.readNext())!=null) 
            {
                createNode(rows);
                System.out.println(rows[0]);

            }
            reader.close();
        } 


        catch (FileNotFoundException e) 
        {
            // TODO Auto-generated catch block
            System.err.println("Error: cannot find datasource.");
            e.printStackTrace();
        } 
        catch (IOException e) 
        {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 
}

public void parseEdgesCsv(){

        try 
        {
            CSVReader reader= new CSVReader(new FileReader("/home/sandy/Desktop/workspacesh/importToNeo4j/edgesNeo.csv"),',','"');
            String rows[]=null; 
            while ((rows=reader.readNext())!=null) 
            {
                createRelationshipsUsingIndexes(rows);

            }
            reader.close();
        }   


        catch (FileNotFoundException e) 
        {
            // TODO Auto-generated catch block
            System.err.println("Error: cannot find datasource.");
            e.printStackTrace();
        } 
        catch (IOException e) 
        {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } 
}


public void createNode(String[] rows){

         Transaction tx = graphDb.beginTx();
         try 
            {   
                firstNode = graphDb.createNode(DynamicLabel.label( rows[2] ));
                firstNode.setProperty("id",rows[0] );
                firstNode.setProperty("name",rows[1] );
                System.out.println(firstNode.getProperty("id"));
                tx.success();
            } 
            finally
            {
                tx.finish();
            }

}

public void createRelationshipsUsingIndexes(String rows[]){

        Transaction tx = graphDb.beginTx();
        try
        {
            ReadableIndex autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
            // node1 and node2 both had auto indexed properties, get them
            firstNode=autoNodeIndex.get( "id", rows[0] ).getSingle();
            secondNode=autoNodeIndex.get( "id", rows[1] ).getSingle();

            relationship = firstNode.createRelationshipTo( secondNode, RelTypes.CO_OCCURRED );
            relationship.setProperty( "frequency", rows[2] );
            relationship.setProperty( "generatability_score", rows[3] );
            tx.success();   

        }
        finally
        {
              tx.finish();
        }


}

Michael Hunger · Accepted Answer

What is the memory config (heap) you are using for your import? What OS are you running on (I assume some Linux) and what Neo4j version are you using?

I recommend upgrading to the latest stable version of Neo4j 2.0.3

There are a few problems with your import:

you're not passing mmio settings
don't use the legacy indexes
don't use one transaction per node, but one transaction per 50k nodes or 50k relationships
don't read from the index during insertion, use an in-memory structure for holding that information (e.g. a Map)
don't print output for each node, instead print an output for every tx commit (every 50k elements)
use a BufferedReader around your FileReader for better CSV read performance.

It would make more sense to use my batch-importer for fast initial imports

Neo4j Insertion taking more time

Answers (1)

Related Questions