Reputation: 377
When using TinkerPop/JanusGraph I am able to define, VertexLabels and Property Keys which I can than use to create composite indexes. I read somewhere on the Neptune documentation that indexes are not necessary (or supported).
My question is then how do I prevent duplication when loading data into the database? The only examples I found on the AWS documentation involves loading data where an Unique ID is already provided for each record, which for me seems like I would need to first extract data from a RDBMS in order to have all the IDs and their relationships before I can load it.
Am I understanding this correctly, if not how could I solve this?
Upvotes: 6
Views: 2689
Reputation: 271
Yes your understanding is correct. Uniqueness constraint for vertices & edges applies on their ~id property i.e. IDs are unique.
There are two ways to insert data into Neptune. You can either use the loader interface(recommended) or insert via Gremlin.
Case#1: Insert via bulk loader (recommended)
Inserting via loader only supports CSV format for now and as you observed, it does necessarily require user defined IDs for Vertices and Edges.
Case#2: Insert via Gremlin
For insertion via Gremlin providing IDs is optional. If you do not provide an ID, then Neptune will automatically assign a unique ID to the vertex or the edge. e.g. g.addV() adds a vertex and assigns a unique identifier to it.
Further regarding case#2, you can add the two vertices and the relationship in the same query. This does not require knowledge of the ID auto-assigned to the vertex by the database.
g.addV().as("node1").property("name","Simba").addV().as("node2").property("name","Mufasa").addE("knows").from("node1").to("node2")
Alternatively, use a unique property identifier to query for nodes from the DB:
g.addV().property("name","Simba");
g.addV().property("name","Mufasa");
g.V().has("name","Simba").as("node1").V().has("name","Mufasa").as("node2").addE("knows").from("node1").to("node2");
Upvotes: 5