Define Graph Schema in AWS Neptune to prevent data duplication

Question

When using TinkerPop/JanusGraph I am able to define, VertexLabels and Property Keys which I can than use to create composite indexes. I read somewhere on the Neptune documentation that indexes are not necessary (or supported).

My question is then how do I prevent duplication when loading data into the database? The only examples I found on the AWS documentation involves loading data where an Unique ID is already provided for each record, which for me seems like I would need to first extract data from a RDBMS in order to have all the IDs and their relationships before I can load it.

Am I understanding this correctly, if not how could I solve this?

Divij Vaidya · Accepted Answer

Yes your understanding is correct. Uniqueness constraint for vertices & edges applies on their ~id property i.e. IDs are unique.

There are two ways to insert data into Neptune. You can either use the loader interface(recommended) or insert via Gremlin.

Case#1: Insert via bulk loader (recommended)

Inserting via loader only supports CSV format for now and as you observed, it does necessarily require user defined IDs for Vertices and Edges.

Case#2: Insert via Gremlin

For insertion via Gremlin providing IDs is optional. If you do not provide an ID, then Neptune will automatically assign a unique ID to the vertex or the edge. e.g. g.addV() adds a vertex and assigns a unique identifier to it.

Further regarding case#2, you can add the two vertices and the relationship in the same query. This does not require knowledge of the ID auto-assigned to the vertex by the database.

g.addV().as("node1").property("name","Simba").addV().as("node2").property("name","Mufasa").addE("knows").from("node1").to("node2")

Alternatively, use a unique property identifier to query for nodes from the DB: g.addV().property("name","Simba"); g.addV().property("name","Mufasa"); g.V().has("name","Simba").as("node1").V().has("name","Mufasa").as("node2").addE("knows").from("node1").to("node2");

Define Graph Schema in AWS Neptune to prevent data duplication

Answers (1)

Related Questions