Reputation: 31
I need to load around 5 billion vertex into OrientDB.
OrientDB documentation claims that each cluster can hold over 9 billion vertex and up to 10 billion documents could be inserted in a day. However I have been running for several days now and only a very small subset of my data has been created.
My model is very simple...
Person
-> Name STRING
-> Score DECIMAL
I have left the OrientDB oetl tool running and it has inserted 500 million records in a week. Records are still (very slowly) being loaded at a rate of about 100 rows per second. It started at around 1500 nodes a second but has gradually slowed. Current estimation suggests a few months to load all the data(!)
Meanwhile I have written a separate java app to test bulk insert
final OrientGraphFactory factory = new OrientGraphFactory("plocal:C:/orientdb/databases/test", "X", "X");
factory.declareIntent(new OIntentMassiveInsert());
OrientGraphNoTx graph = factory.getNoTx();
for (int i= 0; i < NUM_INSERTLOOPS; i++)
{
OrientVertex v = graph.addVertex("person", "12");
v.setProperty("Name", "test" + i);
v.setProperty("Score", 100 * i);
}
graph.commit();
This too starts off very well, 100k nodes in about 8 seconds but after 10 million or so this suddenly seems to slow to over 5 minutes. I stopped it after 24 hours and 50 million nodes.
I am running a local database on a well-spec'ed Windows machine. CPU usage sits around 1% whilst running either process. The oetl is using about 15B of 20GB RAM, the java program far less.
Is this performance expected or have I misunderstood something. I am happy to wait several days to load my data but not several weeks (/months!!)
Upvotes: 2
Views: 1331
Reputation: 1949
For java importer in NoTx to optimize you can use addVertex with inline properties like this.
OrientVertex v = graph.addVertex("class:person", "Name","test" + i,"Score", 100 * i);
this will will call only one time the save.
The next step to improve speed, is using multithread
Upvotes: 0