Thomas Koppensteiner
Thomas Koppensteiner

Reputation: 717

How to import Edges from CSV with ETL into OrientDB graph?

I'm trying to import edges from a CSV-file into OrientDB. The vertices are stored in a separate file and already imported via ETL into OrientDB. So my situation is similar to OrientDB import edges only using ETL tool and OrientDB ETL loading CSV with vertices in one file and edges in another.


Update

Friend.csv

"id","client_id","first_name","last_name"
"0","0","John-0","Doe"
"1","1","John-1","Doe"
"2","2","John-2","Doe"
...

The "id" field is removed by the Friend-Importer, but the "client_id" is stored. The idea is to have a known client-side generated id for searching etc.

PeindingFriendship.csv

"friendship_id","client_id","from","to"
"0","0-1","1","0"
"2","0-15","15","0"
"3","0-16","16","0"
...

The "friendship_id" and "client_id" should be imported as attributes of the "PendingFriendship" edge. "from" is a "client_id" of a Friend. "to" is a "client_id" of another Friend. For "client_id" exists a unique Index on both Friend and PendingFriendship.


My ETL configuration looks like this

...
"extractor": {
  "csv": {
  }
},
"transformers": [
  {
    "command": {
      "command": "CREATE EDGE PendingFriendship FROM (SELECT FROM Friend WHERE client_id = '${input.from}') TO (SELECT FROM Friend WHERE client_id = '${input.to}') SET client_id = '${input.client_id}'",
      "output": "edge"
    }
  },
  {
    "field": {
      "fieldName": "from",
      "expression": "remove"
    }
  },
  {
    "field": {
      "fieldName": "to",
      "operation": "remove"
    }
  },
  {
    "field": {
      "fieldName": "friendship_id",
      "expression": "remove"
    }
  },
  {
    "field": {
      "fieldName": "client_id",
      "operation": "remove"
    }
  },
  {
    "field": {
      "fieldName": "@class",
      "value": "PendingFriendship"
    }
  }
],
... 

The issue with this configuration is that it creates two edge entries. One is the expected "PendingFriendship" edge. The second one is an empty "PendingFriendship" edge, with all the fields I removed as attributes with empty values. The import fails, at the second row/document, because another empty "PendingFriendship" cannot be inserted because it violates a uniqueness constraint. How can I avoid the creation of the unnecessary empty "PendingFriendship". What is the best way to import edges into OrientDB? All the examples in the documentation use CSV files where vertices and edges are in one file, but this is not the case for me.

I also had a look into the Edge-Transformer, but it returns a Vertex not an Edge!

Created PendingFriendships

Upvotes: 1

Views: 532

Answers (1)

Thomas Koppensteiner
Thomas Koppensteiner

Reputation: 717

After some time I found a way (workaround) to import the above data into OrientDB. Instead of using the ETL Tool I wrote simple ruby scripts which call the HTTP API of OrientDB using the Batch endpoint.

Steps:

  1. Import the Friends.
  2. Use the response to create a mapping of client_ids to @rids.
  3. Parse the PeindingFriendship.csv and build batch requests.
  4. Each Friendships is created by its own command.
  5. The mapping from 2. is used to insert the @rids into the command from 4.
  6. Send the batch requests in junks of 1000 commands.

Example Batch-Request body:

{
  "transaction" : true,
  "operations" : [
    {
      "type" : "cmd",
      "language" : "sql",
      "command" : "create edge PendingFriendship from #27:178 to #27:179 set client_id='4711'"
    }
  ]
}

This isn't the answer to the question I asked, but it solves the higher goal of importing data into OrientDB, for me. Therefore I leave it open for the community to mark this question as solved or not.

Upvotes: 0

Related Questions