Tony Ennis
Tony Ennis

Reputation: 12299

Neo4j labels and properties, and their differences

Say we have a Neo4j database with several 50,000 node subgraphs. Each subgraph has a root. I want to find all nodes in one subgraph.

One way would be to recursively walk the tree. It works but can be thousands of trips to the database.

One way is to add a subgraph identifier to each node:

MATCH(n {subgraph_id:{my_graph_id}}) return n

Another way would be to relate each node in a subgraph to the subgraph's root:

MATCH(n)-[]->(root:ROOT {id: {my_graph_id}}) return n

This feels more "graphy" if that matters. Seems expensive.

Or, I could add a label to each node. If {my_graph_id} was "BOBS_QA_COPY" then

MATCH(n:BOBS_QA_COPY) return n

would scoop up all the nodes in the subgraph.

My question is when is it appropriate to use a garden-variety property, add relationships, or set a label?

Setting a label to identify a particular subgraph makes me feel weird, like I am abusing the tool. I expect labels to say what something is, not which instance of something it is.

For example, if we were graphing car information, I could see having parts labeled "FORD EXPLORER". But I am less sure that it would make sense to have parts labeled "TONYS FORD EXPLORER". Now, I could see (USER id:"Tony") having a relationship to a FORD EXPLORER graph...

I may be having a bout of "SQL brain"...

Upvotes: 1

Views: 515

Answers (1)

cybersam
cybersam

Reputation: 66989

Let's work this through, step by step.

  1. If there are N non-root nodes, adding an extra N ROOT relationships makes the least sense. It is very expensive in storage, it will pollute the data model with relationships that don't need to be there and that can unnecessarily complicate queries that want to traverse paths, and it is not the fastest way to find all the nodes in a subgraph.

  2. Adding a subgraph ID property to every node is also expensive in storage (but less so), and would require either: (a) scanning every node to find all the nodes with a specific ID (slow), or (b) using an index, say, :Node(subgraph_id) (faster). Approach (b), which is preferable, would also require that all the nodes have the same Node label.

  3. But wait, if approach 2(b) already requires all nodes to be labelled, why don't we just use a different label for each subgroup? By doing that, we don't need the subgraph_id property at all, and we don't need an index either! And finding all the nodes with the same label is fast.

Thus, using a per-subgroup label would be the best option.

Upvotes: 2

Related Questions