lambdapilgrim
lambdapilgrim

Reputation: 1071

How to use OrientDB ETL to create edges only

I have two CSV files:

First containing ~ 500M records in the following format

id,name
10000023432,Tom User
13943423235,Blah Person

Second containing ~ 1.5B friend relationships in the following format

fromId,toId
10000023432,13943423235

I used OrientDB ETL tool to create vertices from the first CSV file. Now, I just need to create edges to establish friendship connection between them.

I have tried multiple configuration of the ETL json file so far, the latest being this one:

{
    "config": {"parallel": true},
    "source": { "file": { "path": "path_to_file" } },
    "extractor": { "csv": {} },
    "transformers": [
        { "vertex": {"class": "Person", "skipDuplicates": true} },
        { "edge": { "class": "FriendsWith",
                    "joinFieldName": "from",
                    "lookup": "Person.id",
                    "unresolvedLinkAction": "SKIP",
                    "targetVertexFields":{
                        "id": "${input.to}"
                    },
                    "direction": "out"
                  }
        },
        { "code": { "language": "Javascript",
                    "code": "print('Current record: ' + record);  record;"}
        }
    ],
    "loader": {
        "orientdb": {
            "dbURL": "remote:<DB connection string>",
            "dbType": "graph",
            "classes": [
                {"name": "FriendsWith", "extends": "E"}
            ], "indexes": [
                {"class":"Person", "fields":["id:long"], "type":"UNIQUE" }
            ]
        }
    }
}

But unfortunately, this also creates the vertex with "from" and "to" property, in addition to creating the edge.

When I try removing the vertex transformer, ETL process throws an error:

Error in Pipeline execution: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d13
6a8' is not supported
Exception in thread "OrientDB ETL pipeline-0" com.orientechnologies.orient.etl.OETLProcessHaltedException: Halt
        at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:149)
        at com.orientechnologies.orient.etl.OETLProcessor$2.run(OETLProcessor.java:341)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.orientechnologies.orient.etl.transformer.OTransformException: edge: input type 'com.orientechnologies.orient.core.record.impl.ODocument$1$1@40d136a8' is not suppor
ted
        at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:107)
        at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
        at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:115)
        ... 2 more

What am I missing here?

Upvotes: 6

Views: 1672

Answers (2)

K.Roland
K.Roland

Reputation: 128

You can import the edges with these ETL transformers:

"transformers": [
    { "merge": { "joinFieldName": "fromId", "lookup": "Person.id" } },
    { "vertex": {"class": "Person", "skipDuplicates": true} },
    { "edge": { "class": "FriendsWith",
                "joinFieldName": "toId",
                "lookup": "Person.id",
                "direction": "out"
              }
    },
    { "field": { "fieldNames": ["fromId", "toId"], "operation": "remove" } }
]

The "merge" transformer will join the current csv line with related Person record (this is a bit strange but for some reason this is neccessary to associate fromId with the source person).

The "field" transformer will remove the csv fields added by the merge section. You can try the import without "field" transformer as well to see the difference.

Upvotes: 7

Alessandro Rota
Alessandro Rota

Reputation: 3570

With Java API you could read the csv and then create the edges

        String nomeYourDb = "nomeYourDb";
        OServerAdmin serverAdmin;
        try {
            serverAdmin = new OServerAdmin("remote:localhost/"+nomeYourDb).connect("root", "root");
            if (serverAdmin.existsDatabase()) {
                OrientGraph g = new OrientGraph("remote:localhost/"+nomeYourDb);
                String csvFile = "path_to_file";
                BufferedReader br = null;
                String line = "";
                String cvsSplitBy = "   ";   // your separator
                try {
                    br = new BufferedReader(new FileReader(csvFile));
                    int index=0;
                    while ((line = br.readLine()) != null) {
                        if(index==0){
                            index=1;
                        }
                        else{
                            String[] ids = line.split(cvsSplitBy);
                            String personFrom="(select from Person where id='"+ids[0]+"')";
                            String personTo="(select from Person where id='"+ids[1]+"')";
                            String query="create edge FriendsWith from "+personFrom+" to "+personTo;
                            g.command(new OCommandSQL(query)).execute();
                        }
                    }
                } catch (FileNotFoundException e) {
                    e.printStackTrace();
                } catch (IOException e) {
                    e.printStackTrace();
                }
                finally {
                if (br != null) {
                        br.close();
                }
            }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }

Upvotes: 1

Related Questions