Reputation: 53
We are having about 50,000 nodes and 80,00,000 (80 lakhs) edges.
We are trying to insert this data into neo4j (embedded graph database) using java. But it's taking lot of time (hours together).
We want to know whether we are going wrong in insertion anywhere. we are using automatic indexes for nodes. The complete implementation is given below.
Please let me know what is going wrong and changes to do with the below code.
public static void main(String[] args)
{
// TODO Auto-generated method stub
nodeGraph obj = new nodeGraph();
obj.createDB();
System.out.println("Graph Database Initialised");
obj.parseNodesCsv();
System.out.println("Creating relationships in process....");
obj.parseEdgesCsv();
obj.shutDown();
}
public void createDB() {
graphDb = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder( DB_PATH ).
setConfig( GraphDatabaseSettings.node_keys_indexable, "id,name" ).
setConfig( GraphDatabaseSettings.relationship_keys_indexable, "rel" ).
setConfig( GraphDatabaseSettings.node_auto_indexing, "true" ).
setConfig( GraphDatabaseSettings.relationship_auto_indexing, "true" ).
newGraphDatabase();
registerShutdownHook(graphDb);
// Get the Node AutoIndexer, set nodeProp1 and nodeProp2 as auto
// indexed.
AutoIndexer<Node> nodeAutoIndexer = graphDb.index().getNodeAutoIndexer();
nodeAutoIndexer.startAutoIndexingProperty( "id" );
nodeAutoIndexer.startAutoIndexingProperty( "name" );
// Get the Relationship AutoIndexer
//AutoIndexer<Relationship> relAutoIndexer = graphDb.index().getRelationshipAutoIndexer();
//relAutoIndexer.startAutoIndexingProperty( "relProp1" );
// None of the AutoIndexers are enabled so far. Do that now
nodeAutoIndexer.setEnabled( true );
//relAutoIndexer.setEnabled( true );
}
public void parseNodesCsv(){
try
{
CSVReader reader= new CSVReader(new FileReader("/home/sandy/Desktop/workspacesh/importToNeo4j/nodesNeo.csv"),' ','"');
String rows[]=null;
while ((rows=reader.readNext())!=null)
{
createNode(rows);
System.out.println(rows[0]);
}
reader.close();
}
catch (FileNotFoundException e)
{
// TODO Auto-generated catch block
System.err.println("Error: cannot find datasource.");
e.printStackTrace();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void parseEdgesCsv(){
try
{
CSVReader reader= new CSVReader(new FileReader("/home/sandy/Desktop/workspacesh/importToNeo4j/edgesNeo.csv"),',','"');
String rows[]=null;
while ((rows=reader.readNext())!=null)
{
createRelationshipsUsingIndexes(rows);
}
reader.close();
}
catch (FileNotFoundException e)
{
// TODO Auto-generated catch block
System.err.println("Error: cannot find datasource.");
e.printStackTrace();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void createNode(String[] rows){
Transaction tx = graphDb.beginTx();
try
{
firstNode = graphDb.createNode(DynamicLabel.label( rows[2] ));
firstNode.setProperty("id",rows[0] );
firstNode.setProperty("name",rows[1] );
System.out.println(firstNode.getProperty("id"));
tx.success();
}
finally
{
tx.finish();
}
}
public void createRelationshipsUsingIndexes(String rows[]){
Transaction tx = graphDb.beginTx();
try
{
ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
// node1 and node2 both had auto indexed properties, get them
firstNode=autoNodeIndex.get( "id", rows[0] ).getSingle();
secondNode=autoNodeIndex.get( "id", rows[1] ).getSingle();
relationship = firstNode.createRelationshipTo( secondNode, RelTypes.CO_OCCURRED );
relationship.setProperty( "frequency", rows[2] );
relationship.setProperty( "generatability_score", rows[3] );
tx.success();
}
finally
{
tx.finish();
}
}
Upvotes: 1
Views: 142
Reputation: 41706
What is the memory config (heap) you are using for your import? What OS are you running on (I assume some Linux) and what Neo4j version are you using?
I recommend upgrading to the latest stable version of Neo4j 2.0.3
There are a few problems with your import:
use a BufferedReader around your FileReader for better CSV read performance.
It would make more sense to use my batch-importer for fast initial imports
Upvotes: 1