Which indexing system should I use?

Question

I'm currently using py2neo to interface with my neo4j server. One thing that I'd like to do is enforce a uniqueness constraint for a label (i.e. enforce a unique client-generated hash on the server side). For the sake of example, I have the following schema:

ON :Organization(uid)    ONLINE (for uniqueness constraint)

Since I'm using py2neo, my normal node creation sequence usually entails:

Generate the UID hash based on properties of the organization
Add it to the database
Add the "Organization" label to the hydrated node returned by the add statement

This works just fine. When I go to create a duplicate node, I:

Generate the UID hash based on properties of the organization
Add it to the database
Attempt to add the "Organization" label, which fails due to the uniqueness constraint.

The problem with the step above is that I now have a label-less duplicate node on my graph. Instead I'd like to get a reference to the existing node since this is usually executed within the context of relationship creation. To accomplish this, I need to be able to create the node and label it before adding it to the graph, which currently cannot be done cleanly with py2neo/the REST API. I can't use the batch API as that fails with the same error (and doesn't return a copy of the existing node).

A workaround is:

Generate the UID hash based on properties of the organization
Query the database for a node with that hash
If it exists, use that, otherwise add a node to the database and then add the "Organization" label to the node.

The downside of that is I'm performing extra network requests as well as avoidable I/O. The Cypher analogue I'm looking for is MERGE. It seems as if I have two or three options here:

Instead of using the standard graph create operation, I convert the node abstract to a Cypher MERGE statement and execute that.
Fall back to the "legacy" indexing system which provides a get_or_create method.

The legacy indexing system also seems to provide a better short-term outlook in that I can create full text indices, and it seems as if I also get better performance out of it. Any thoughts/suggestions?

Michael Hunger · Accepted Answer

I'd say use MERGE, which also does the correct locking and guarantees the uniqueness of your node.

The uniqueness check is imho done immediately, not sure about the visibility of changes of other threads performing operations at the same time. MERGE takes an index lock and makes sure only one thread at a time checks the uniqueness constraint.

Which indexing system should I use?

Answers (2)

Related Questions