Jennifer
Jennifer

Reputation: 31

OrientDB slow insert

I need to load around 5 billion vertex into OrientDB.

OrientDB documentation claims that each cluster can hold over 9 billion vertex and up to 10 billion documents could be inserted in a day. However I have been running for several days now and only a very small subset of my data has been created.

My model is very simple...

Person
-> Name   STRING
-> Score  DECIMAL

1.

I have left the OrientDB oetl tool running and it has inserted 500 million records in a week. Records are still (very slowly) being loaded at a rate of about 100 rows per second. It started at around 1500 nodes a second but has gradually slowed. Current estimation suggests a few months to load all the data(!)

2.

Meanwhile I have written a separate java app to test bulk insert

final OrientGraphFactory factory = new OrientGraphFactory("plocal:C:/orientdb/databases/test", "X", "X");
factory.declareIntent(new OIntentMassiveInsert());

OrientGraphNoTx graph = factory.getNoTx();

for (int i= 0; i < NUM_INSERTLOOPS; i++)
{                       
    OrientVertex v = graph.addVertex("person", "12");
    v.setProperty("Name", "test" + i);
    v.setProperty("Score", 100 * i);
}

graph.commit();

This too starts off very well, 100k nodes in about 8 seconds but after 10 million or so this suddenly seems to slow to over 5 minutes. I stopped it after 24 hours and 50 million nodes.

I am running a local database on a well-spec'ed Windows machine. CPU usage sits around 1% whilst running either process. The oetl is using about 15B of 20GB RAM, the java program far less.

Is this performance expected or have I misunderstood something. I am happy to wait several days to load my data but not several weeks (/months!!)

Upvotes: 2

Views: 1331

Answers (1)

wolf4ood
wolf4ood

Reputation: 1949

For java importer in NoTx to optimize you can use addVertex with inline properties like this.

OrientVertex v = graph.addVertex("class:person", "Name","test" + i,"Score", 100 * i);

this will will call only one time the save.

The next step to improve speed, is using multithread

Upvotes: 0

Related Questions