Reputation: 894
I am implementing a graph database for a large inventory management system involving many different kinds of containers and locations. When creating the initial layout, I have {items} that are "containedBy" {boxes} that are "containedBy" {shelves}. The decision I am looking into involves expected location vs actual location.
In our inventory, it is possible that items expected to be contained in a box are not present when the box is opened. This is related to upstream management by vendors. When receiving the manifest, I will be generating the vertices in the database along with their edges to represent the items being in the boxes. When the boxes are opened, I will be updating the database during the receiving process. What I want to know is this: Is it better to use an edge of "expectedContainedBy" and "containedBy" to represent possible and actual containment, or would it be better to have a single edge "containedBy" with a property of "Present: true/false".
My question here is not from a preference standpoint, but from efficiency for the purpose of retrieval and analytics. I have looked into this a bit already, but am not sure if searching for a set of edges by a property would be more efficient than searching by edge label, or if the database will grow unreasonably large by having so many edges.
Edit for clarification: Database is an Azure CosmosDB graph database using Gremlin for our query language.
Upvotes: 0
Views: 234
Reputation: 739
If I flip this problem over, I see a Schedule object. An Item is linked to a Schedule and the Schedule is linked to all of the locations, past, present, and future, where an item has been, is, and will be stored. Those Location objects (boxes, shelves, etc.) are all linked to all of the things that have transited though those locations. Before items arrive, knowing that they are going to arrive, a schedule can be created with respect to the other active schedules. You can ask "the system" which shelves will be available at 11:15 to store the new arrivals.
Why are some of the vegetables rotting faster than others? You can check the storage history and see if the rotting vegetables share a common storage location or warehouse region.
Upvotes: 1
Reputation: 46226
Without thinking too hard, I'd say that I'd prefer "containedBy" with a boolean "present" property. It feels natural to me and when I think about the Gremlin you'd likely be writing to query this data that design should keep the queries quite readable.
As for efficiency, it depends. If you only expect to have ten "containedBy" edges per box, then I don't think there is much to consider in terms of efficiency. On the other hands tens of thousands of "containedBy" edges would probably be different story. At that point you need to consider the capabilities of your graph database and the types of queries you intend to write. For example, with some (most?) graphs you may see that for tens of thousands of edges per "box" vertex that it will be faster to have two separate labels. Or perhaps if you were using a graph like JanusGraph which has vertex centric indices you might find that adding an index on "present" gets you the performance you desire while keeping the clean design of a single "containedBy" label.
Upvotes: 1