Nitzan Tomer
Nitzan Tomer

Reputation: 164129

Indexing in Neo4j

I'm wonderring what's a better approach when needing to have multiple indecies based on some node type or field. For example, let's say I want to have a graph of students and want to index them by their school and id.

As I understand I can have an index per school like this:

// add student
Index<Node> index = this.graphDb.index().forNodes(schoolName);
Node node = this.graphDb.createNode();
node.setProperty("id", studentId);
index.add(node, "id", studentId);

// get student
Index<Node> index = this.graphDb.index().forNodes(schoolName);
Node node = index.get("id", studentId).getSingle();

I can on the other hand use one index and do something like:

// add student
Index<Node> index = this.graphDb.index().forNodes("schools");
Node node = this.graphDb.createNode();
node.setProperty("id", studentId);
index.add(node, schoolName + ":id", studentId);

// get student
Index<Node> index = this.graphDb.index().forNodes("schools");
Node node = index.get(schoolName + ":id", studentId).getSingle();

What is a better approach? Any advantages to one over the other? Especially performance wise or storage wise, when there are a lot of nodes involved.

Thanks

Upvotes: 4

Views: 3257

Answers (1)

Michael Hunger
Michael Hunger

Reputation: 41676

Your approach is perfectly valid. If you want to query all students of a school you can use:

Iterable<Node> pupils = index.query(schoolName + ":*");

You can also just add both fields to the index:

index.add(node, "schoolName", studentId);
index.add(node, "id", studentId);

and then query them by a combined query

Iterable<Node> pupils = index.query("schoolName:"+schoolName + " AND id:"+id);

The first one is smaller in index size but the second one is more powerful. Performance wise it won't make such a big difference (but you can test it and report back).

You could also use an structure in the graph where a school is a node and the pupils are attached to it by a LEARNS_AT relationship which can also have a start and end temporal property, so it is easier to model your domain. See this demo graph

Upvotes: 7

Related Questions