Reputation: 717
I'm trying to import edges from a CSV-file into OrientDB. The vertices are stored in a separate file and already imported via ETL into OrientDB. So my situation is similar to OrientDB import edges only using ETL tool and OrientDB ETL loading CSV with vertices in one file and edges in another.
Update
Friend.csv
"id","client_id","first_name","last_name"
"0","0","John-0","Doe"
"1","1","John-1","Doe"
"2","2","John-2","Doe"
...
The "id"
field is removed by the Friend-Importer, but the "client_id"
is stored. The idea is to have a known client-side generated id
for searching etc.
PeindingFriendship.csv
"friendship_id","client_id","from","to"
"0","0-1","1","0"
"2","0-15","15","0"
"3","0-16","16","0"
...
The "friendship_id"
and "client_id"
should be imported as attributes of the "PendingFriendship"
edge. "from"
is a "client_id"
of a Friend. "to"
is a "client_id"
of another Friend.
For "client_id"
exists a unique Index on both Friend
and PendingFriendship
.
My ETL configuration looks like this
...
"extractor": {
"csv": {
}
},
"transformers": [
{
"command": {
"command": "CREATE EDGE PendingFriendship FROM (SELECT FROM Friend WHERE client_id = '${input.from}') TO (SELECT FROM Friend WHERE client_id = '${input.to}') SET client_id = '${input.client_id}'",
"output": "edge"
}
},
{
"field": {
"fieldName": "from",
"expression": "remove"
}
},
{
"field": {
"fieldName": "to",
"operation": "remove"
}
},
{
"field": {
"fieldName": "friendship_id",
"expression": "remove"
}
},
{
"field": {
"fieldName": "client_id",
"operation": "remove"
}
},
{
"field": {
"fieldName": "@class",
"value": "PendingFriendship"
}
}
],
...
The issue with this configuration is that it creates two edge entries. One is the expected "PendingFriendship" edge. The second one is an empty "PendingFriendship" edge, with all the fields I removed as attributes with empty values. The import fails, at the second row/document, because another empty "PendingFriendship" cannot be inserted because it violates a uniqueness constraint. How can I avoid the creation of the unnecessary empty "PendingFriendship". What is the best way to import edges into OrientDB? All the examples in the documentation use CSV files where vertices and edges are in one file, but this is not the case for me.
I also had a look into the Edge-Transformer, but it returns a Vertex not an Edge!
Upvotes: 1
Views: 532
Reputation: 717
After some time I found a way (workaround) to import the above data into OrientDB. Instead of using the ETL Tool I wrote simple ruby scripts which call the HTTP API of OrientDB using the Batch endpoint.
Steps:
client_ids
to @rids
.PeindingFriendship.csv
and build batch
requests.@rids
into the command from 4.batch
requests in junks of 1000 commands.Example Batch-Request body:
{
"transaction" : true,
"operations" : [
{
"type" : "cmd",
"language" : "sql",
"command" : "create edge PendingFriendship from #27:178 to #27:179 set client_id='4711'"
}
]
}
This isn't the answer to the question I asked, but it solves the higher goal of importing data into OrientDB, for me. Therefore I leave it open for the community to mark this question as solved or not.
Upvotes: 0