Logical Fallacy
Logical Fallacy

Reputation: 3107

Are labels auto-indexed in Neo4j?

Totally new to graph databases -- corrections welcome.

If I want to obtain a list of nodes labeled with the "User" label, does neo4j (or possibly other graph databases) need to search all nodes for that label or does it somehow auto-index nodes by label?

Without indexing, (horrible performance) every node is queried to see if any one of its labels matches "User," like so:

List<Node> userNodes = new List<Node>();
for (Node node : all_nodes)
{
  for (Label label : node.labels())
  {
    if (label.name() == "User")
    {
      userNodes.Add(node);

      // no need to look at other labels for this node
      break;
    }
  }
}
return userNodes;

With indexing, the system grabs some system-managed "node" that has all of the label names under it (search space of dozens instead of millions) and grabs its children:

  List<Node> userNodes = new List<Node>();
  for (Node labelNode : labels_node) // where labels_node is system-managed
  {
    if (labelNode.name() == "User")
    {
      // All children of the "User" node have the label "User"
      userNodes = labelNode.children();

      // No need to look at other labels
      break;
    }
  }
  return userNodes;

Ultimately, I think this question gets to this: if I am building a list of "things" for which I need to retrieve all of them by type of thing, should I use labels to accomplish this? Or should I instead create my own "Users" node, which points to all nodes that are users, and only use labels once I have found the subset of nodes I want?

It seems this question is similar though more vague but did not receive a satisfactory answer.

Upvotes: 2

Views: 1302

Answers (3)

Michael Hunger
Michael Hunger

Reputation: 41676

There is a dedicated method in

ops = GlobalGraphOperations.at(gdb);
for (Node node : ops.getAllNodesWithLabel(DynamicLabel.label("User")) {
  // do sth with node
}

which uses the optimized label-scan-store behind the scenes.

Upvotes: 0

FrobberOfBits
FrobberOfBits

Reputation: 18002

Terminology wise, the docs talk about "labels and schema indexes". An "index" is a thing that you attach on a label property, such as indexing all first_name attributes of :Person nodes.

But for your question, labels behave like indexes because yes, the execution engine takes advantage of them and use them like you'd expect an index, even though the documentation doesn't talk about labels as indexes.

So, for a concrete example, suppose we had a graph of 1 million nodes, of which 5 of them had the label :Person. And suppose we had the following query:

MATCH (p:Person) RETURN p;

The question boils down to, how many nodes does cypher have to consider? The answer is 5, not 1 million.

Your second code snippet is more of a neo4j version 1.9 kind of approach; nowadays I wouldn't create these artificial "index nodes", and I wouldn't loop through all possible labels, I'd just match by label and be done with it.

Upvotes: 8

Christophe Willemsen
Christophe Willemsen

Reputation: 20185

Yes labels are indexed automatically, meaning that if you have 1000 user nodes where 700 are active users, querying for the Active label will only return you the 700 active users without looking up for the others.

Having super nodes and connecting to them the related ones is a (almost always) bad idea.

Also, you should model your database for querying purposes, look this amazing answer :

Neo4J - Storing into relationship vs nodes

There is a topic too for the difference between using labels or indexed properties on nodes, this blog post is explaining it very well :

http://graphaware.com/neo4j/2015/01/16/neo4j-graph-model-design-labels-versus-indexed-properties.html

You should also profile your queries, meaning also it is non sense to start importing 1million nodes at the beginning, try with 100 and do some queries.

I heard an amazing sentence from someone at the neo4j hq :

Be faithful to your graph and the graph will be faithful to you

Find your way to do it at a manner that it solves your problem !

Upvotes: 1

Related Questions