Best practices for Edge/Properties management

Question

I am implementing a graph database for a large inventory management system involving many different kinds of containers and locations. When creating the initial layout, I have {items} that are "containedBy" {boxes} that are "containedBy" {shelves}. The decision I am looking into involves expected location vs actual location.

In our inventory, it is possible that items expected to be contained in a box are not present when the box is opened. This is related to upstream management by vendors. When receiving the manifest, I will be generating the vertices in the database along with their edges to represent the items being in the boxes. When the boxes are opened, I will be updating the database during the receiving process. What I want to know is this: Is it better to use an edge of "expectedContainedBy" and "containedBy" to represent possible and actual containment, or would it be better to have a single edge "containedBy" with a property of "Present: true/false".

My question here is not from a preference standpoint, but from efficiency for the purpose of retrieval and analytics. I have looked into this a bit already, but am not sure if searching for a set of edges by a property would be more efficient than searching by edge label, or if the database will grow unreasonably large by having so many edges.

Edit for clarification: Database is an Azure CosmosDB graph database using Gremlin for our query language.

stephen mallette · Accepted Answer

Without thinking too hard, I'd say that I'd prefer "containedBy" with a boolean "present" property. It feels natural to me and when I think about the Gremlin you'd likely be writing to query this data that design should keep the queries quite readable.

As for efficiency, it depends. If you only expect to have ten "containedBy" edges per box, then I don't think there is much to consider in terms of efficiency. On the other hands tens of thousands of "containedBy" edges would probably be different story. At that point you need to consider the capabilities of your graph database and the types of queries you intend to write. For example, with some (most?) graphs you may see that for tens of thousands of edges per "box" vertex that it will be faster to have two separate labels. Or perhaps if you were using a graph like JanusGraph which has vertex centric indices you might find that adding an index on "present" gets you the performance you desire while keeping the clean design of a single "containedBy" label.

Best practices for Edge/Properties management

Answers (2)

Related Questions