Sravan
Sravan

Reputation: 553

How is representing all information in Nodes vs Attributes affect storage, computations?

While using Graph Databases(my case Neo4j), we can represent the same information many ways. Making each entity a Node and connecting all entities through relationships or just adding the entities to attribute list of a Node.diff

Following are two different representations of the same data. All info represented as Nodes Info represented as Properties of specific Nodes Overall, which mechanism is suitable in which conditions?

My use case involves traversing the Database from different nodes until 4 depths and examining the information through connected nodes or attributes (based on which approach it is). One query of interest may be, "Who are the friends of John who went to Stanford?"

What is the difference in terms of Storage, computations

Upvotes: 0

Views: 586

Answers (2)

Peter Neubauer
Peter Neubauer

Reputation: 6331

Normally, properties are loaded lazily, and are more expensive to hold in cache, especially strings. Nodes and Relationships are most effective for traversal, especially since the relationships types are stored together with the relatoinship records and thus don't trigger property loads when used in traversals.

Also, a balanced graph (that is, not many dense nodes with over say 10K relationships) is most effective to traverse.

I would try to model most of the reoccurring proeprties as nodes connecting to the entities, thus using the graph itself to index on these values, instead of having to revert to filter on property values or index the property with an expensive index lookup.

Upvotes: 1

Luanne
Luanne

Reputation: 19373

The first one is much better since you're querying on entities such as Stanford- and that entity is related to many person nodes. My opinion that modeling as nodes is more intuitive and easier to query on. "Find all persons who went to Stanford" would not be very easy to do in your second model as you don't have a place to start traversing from. I'd use attributes mainly to describe the node/entity use them to filter results from the query e.g. Who are friends of John who went to Stanford in the year 2010. In this case, the year attribute would just be used to trim the results. Depends on your use case- if year is really important and drives a lot of queries or is used to represent a timeline, you could even model the year as a node attached to Stanford.

Upvotes: 0

Related Questions