Albert S
Albert S

Reputation: 2602

OrientDB ETL: how to skip a duplicate vertex but create the edge

I am creating a communication graph.
Each message has a msgid and each person has a userid.
I have already created the message vertices, now i want to create the user vertices and an edge connecting a message vertex to the user vertex.
A user can get multiple messages (obviously).
My file contains:
msgid, userid, (and some other info i will assign to the edge)

The isssue that i am having is that in my file i have duplicate userids (because users can get multiple messages), i dont want to create another vertex with the user id so i skipDuplicates. But if i do skip duplicates the edge will not get created either. I do want multiple edges to the same user vertex as each edge represents one message.

How do i keep the User vertex unique but create the edge?

My current ETL .json file that works fine with the exception of what i have detailed above.

{
 "source": { "file": { "path": "msgs.txt" } },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": {"separator": "\t"} },
      { "vertex": { "class": "user", "skipDuplicates": true  } },
    { "edge": { "class": "sent_to", "joinFieldName": "msgid", "lookup":"message.id","direction": "in"   } },
    "edgeFields": { "n": "${input.n}" }


  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:/localhost/databases/communication",
       "dbType": "graph",
       "classes": [
         {"name": "user",    "extends":  "V"},
         {"name": "message", "extends": "V"},
         {"name": "sent_to",     "extends":  "E"}
       ], "indexes": [
         {"class":"user", "fields":["id"], "type":"UNIQUE" }
       ]
    }
  }
}

Upvotes: 2

Views: 593

Answers (1)

Albert S
Albert S

Reputation: 2602

Okay, here is what i did and it seemed to work.
First i created the message vertices (as stated above, in the q.).
Then i created the user vertices.
Then to create the edge in between them i ran the following ETL on a file that had {userid, msgid, ...}

{

  "source": { "file": { "path": "msgs1.txt" } },
  "extractor": { "row": {} },
  "transformers": [
    { "csv": {"separator": "\t"} },
    { "merge": {"joinFieldName": "userid", "lookup": "user.id"} },
    { "vertex": { "class": "user", "skipDuplicates": true  } },
    { "edge": { "class": "sent_to",
                "joinFieldName": "msgid",
                "lookup":"message.id",
                "direction": "in",
                "edgeFields": { "n": "${input.n}",  "date": "${input.date}"}
              }
    }

  ],
  "loader": {
    "orientdb": {
       "dbURL": "remote:/localhost/databases/communication",
       "dbType": "graph",
       "classes": [
         {"name": "user",    "extends":  "V"},
         {"name": "message", "extends": "V"},
         {"name": "sent_to",     "extends":  "E"}
       ],
        "indexes": [
       ]
    }
  }
}

This created all the edges, even if there was more than one edge pointing to a user.
Hopefully this will help someone

Upvotes: 2

Related Questions