Reputation: 3
I'm trying to find the fastest way to import edges to OrientDB Graph from CSV. (My OrientDB version is 2.1.15.)
Now I have a graph with 100k Vertices and 1,5M Edges. Soon I will increase its size to 100M Vertices and 100B+ Edges and I don't want to wait till import ends for months :)
I've tried to do it with different ways:
Default JSON ETL. Edges load rate is about 200-300 rows/sec. Very slow, it works about 1,5h. Tried to change "Tx" mode and other properties, it didnt make any changes in perfomance.
Java Code using class BatchGraph. I tried different Buffer sizes for transactions here, best perfomance was achieved with size 10. But still it works slow for me: about 45m.
Import special JSON format from console (IMPORT DATABASE command). (By the way it is not as good as previous two are for my task.) But it is very slow too - about 1h.
So, Are there any possibilities to import such Graph(1.5M Edges) in OrientDB in a short time? Perfect for me: less than 1 minute. Please, tell me, if i can improve somehow my code.
My json:
{
"source": { "file": { "path": "/opt/orientdb/orientdb-community-2.1.15/bin/csv/1_1500k_edges.csv" } },
"extractor": { "csv": {} },
"transformers": [
{ "merge": { "joinFieldName": "ids", "lookup": "V.id" } },
{ "vertex": { "class": "V" } },
{ "edge": { "class": "Edges",
"joinFieldName": "ide",
"lookup": "V.id",
"direction": "out",
"edgeFields": { "val": "${input.val}" },
"unresolvedLinkAction": "CREATE"} }
],
"loader": {
"orientdb": {
"dbURL": "remote:localhost/graph",
"dbType": "graph",
"wal":false,
"tx":true,
"batchCommit":1000,
"standardElementConstraints": false,
"classes": [
{"name": "V"},
{"name": "Edges", "extends": "E"}
], "indexes": [
{"class":"V", "fields":["id:integer"], "type":"UNIQUE" }
]
}
}
}
Java code:
this.graph = new OrientGraph(this.host, this.name, this.pass);
this.graph.setStandardElementConstraints(false);
this.graph.declareIntent(new OIntentMassiveInsert());
BatchGraph<OrientGraph> bgraph = new BatchGraph<OrientGraph>(this.graph, VertexIDType.NUMBER, buff);
bgraph.setVertexIdKey("id");
<parsing strings from CSV in id[0], id[1] and val - edge property>:
Vertex[] vertices = new Vertex[2];
for (int i=0;i<2;i++) {
vertices[i] = bgraph.getVertex(id[i]);
if (vertices[i]==null) vertices[i]=bgraph.addVertex(id[i]);
}
Edge edge = bgraph.addEdge(null, vertices[0], vertices[1], "Edges");
edge.setProperty("val", val);
Upvotes: 0
Views: 454
Reputation: 2814
I think the only way you have to do the import in ~1 min is to work in plocal:
this.graph = new OrientGraph("plocal:/physical/path/to/db/dir", this.name, this.pass);
If it's a one-shot import, you can just do it from a java program, if it's a recurring operation and you need it to run on a stand-alone instance, you can define a server-side function to do that and expose it with a plugin
http://orientdb.com/docs/2.0/orientdb.wiki/Extend-Server.html
Upvotes: 0