TxAG98
TxAG98

Reputation: 1120

OrientDB ETL create edge using multiple fields in match criteria

I have some data that I'm tracking that looks something like this:

node.csv
Label1,Label2
Alpha,A
Alpha,B
Alpha,C
Bravo,A
Bravo,B

The pair Label1 and Label2 define a unique entry in this data set.

I have another table that has some values in it that I want to link to the vertices created in Table1:

data.csv
Label1,Label2,Data
Alpha,A,10
Alpha,A,20
Alpha,B,30
Bravo,A,99

I'd like to generate edges from entries in Data to Node when both Label1 and Label2 fields match in each.

In this case, I'd have:

Data(Alpha,A,10) ---> Node(Alpha,A)
Data(Alpha,A,20) ---> Node(Alpha,A)
Data(Alpha,B,30) ---> Node(Alpha,B)
Data(Bravo,A,99) ---> Node(Bravo,A)

In another question it appears that this issue gets solved by simply adding an extra "joinFieldName" entry into the json file, but I'm not getting the same result with my data.

My node.json file looks like:

{
    "config": { "log": "info" },
    "source": { "file": { "path": "./node.csv" } },
    "extractor": { "csv": {} },
    "transformers": [ { "vertex": { "class": "Node" } } ],
    "loader": {
        "orientdb": {
            "dbURL": "plocal:test.orientdb",
            "dbType": "graph",
            "batchCommit": 1000,
            "classes": [ {"name": "Node", "extends": "V"} ],
            "indexes": []
        }
    }
}

and my data.json file looks like this:

{
    "config": { "log": "info" },
    "source": { "file": { "path": "./data.csv" } },
    "extractor": { "csv": { } },
    "transformers": [
            { "vertex": { "class": "Data" } },
            { "edge":   { "class":         "Source",
                          "joinFieldName": "Label1",
                          "lookup":        "Node.Label1",
                          "joinFieldName": "Label2",
                          "lookup":        "Node.Label2",
                          "direction":     "in"
                        }
            }
        ],
    "loader": {
        "orientdb": {
            "dbURL": "plocal:test.orientdb",
            "dbType": "graph",
            "batchCommit": 1000,
            "classes": [ {"name": "Data",   "extends": "V"},
                         {"name": "Source", "extends": "E"}
                       ],
            "indexes": []
        }
    }
}

After I run these, I get this output when I query the result:

orientdb {db=test.orientdb}> SELECT FROM V

+----+-----+------+------+------+-------------------+----+-------------+
|#   |@RID |@CLASS|Label1|Label2|out_Source         |Data|in_Source    |
+----+-----+------+------+------+-------------------+----+-------------+
|0   |#25:0|Node  |Alpha |A     |[#41:0,#43:0,#47:0]|    |             |
|1   |#26:0|Node  |Alpha |B     |[#45:0]            |    |             |
|2   |#27:0|Node  |Alpha |C     |                   |    |             |
|3   |#28:0|Node  |Bravo |A     |[#42:0,#44:0,#48:0]|    |             |
|4   |#29:0|Node  |Bravo |B     |[#46:0]            |    |             |
|5   |#33:0|Data  |Alpha |A     |                   |10  |[#41:0,#42:0]|
|6   |#34:0|Data  |Alpha |A     |                   |20  |[#43:0,#44:0]|
|7   |#35:0|Data  |Alpha |B     |                   |30  |[#45:0,#46:0]|
|8   |#36:0|Data  |Bravo |A     |                   |99  |[#47:0,#48:0]|
+----+-----+------+------+------+-------------------+----+-------------+

9 item(s) found. Query executed in 0.012 sec(s).

This is incorrect. I don't want Edges #42:0, #44:0, #46:0 and #47:0:

#42:0 connects Node(Bravo,A) and Data(Alpha,A)
#44:0 connects Node(Bravo,A) and Data(Alpha,A)
#46:0 connects Node(Bravo,B) and Data(Alpha,B) 
#47:0 connects Node(Alpha,A) and Data(Bravo,A)

It looks like adding multiple joinFieldName entries in the transformer is resulting in an OR operation, but I'd like an 'AND' here.

Does anyone know how to fix this? I'm not sure what I'm doing differently than the other StackOverflow question...

Upvotes: 1

Views: 425

Answers (1)

RP-
RP-

Reputation: 5837

After debugging the ETL code, I figured out a workaround. As you said, there is no way to make the multiple joinFieldNames forms one edge. Each joinFieldName will create an edge.

What you can do is, generate an extra column in the CSV file by concatenating "Label1" and "Label2" and use lookup query in edge transformation, something like, assume your data.csv has one extra field like label1_label2 and the values of that field are something like "label1====label2`.

Your edge transformation should have the following

{ "edge":   { "class": "Source",
              "joinFieldName": "label1_label2",
              "lookup": "select expand(n) from (match {class: Node, as: n} return n) where n.Label1+'===='+n.Label2 = ?",
              "direction": "in"
            }
 }

Don't forget to expand the vertex otherwise, ETL thinks that it is a Document. The trick here is to write one query by concatenating multiple fields and passing the equivalent joinFieldName.

Upvotes: 1

Related Questions