Reputation: 25770
At my Neo4j/SDN 4 application all of my Cypher queries are based on internal Neo4j IDs.
This is an issue because I can't rely on these IDs at my web application urls. Neo4j can reuse these IDs so there is a good chance that at some time in future under the same ID we can found absolutely another node.
I tried to re-implement this logic based on the following solution: Using the graph to control unique id generation but noticed a query performance degradation.
From a theoretical point of view, should a Cypher query based on the property with @Index(unique = true, primary = true
)
for example:
@Index(unique = true, primary = true)
private Long uid;
entity.uid = {someId}
work with the same performance as a Cypher query which is based on internal Neo4j ID:
id(entity) = {someId}
UPDATED
This is :schema
output:
Indexes
ON :BaseEntity(uid) ONLINE
ON :Characteristic(lowerName) ONLINE
ON :CharacteristicGroup(lowerName) ONLINE
ON :Criterion(lowerName) ONLINE
ON :CriterionGroup(lowerName) ONLINE
ON :Decision(lowerName) ONLINE
ON :FlagType(name) ONLINE (for uniqueness constraint)
ON :HAS_VALUE_ON(value) ONLINE
ON :HistoryValue(originalValue) ONLINE
ON :Permission(code) ONLINE (for uniqueness constraint)
ON :Role(name) ONLINE (for uniqueness constraint)
ON :User(email) ONLINE (for uniqueness constraint)
ON :User(username) ONLINE (for uniqueness constraint)
ON :Value(value) ONLINE
Constraints
ON ( flagtype:FlagType ) ASSERT flagtype.name IS UNIQUE
ON ( permission:Permission ) ASSERT permission.code IS UNIQUE
ON ( role:Role ) ASSERT role.name IS UNIQUE
ON ( user:User ) ASSERT user.email IS UNIQUE
ON ( user:User ) ASSERT user.username IS UNIQUE
As you can see I have an index on :BaseEntity(uid)
BaseEntity
is a base class in my entity hierarchy, for example:
@NodeEntity
public abstract class BaseEntity {
@GraphId
private Long id;
@Index(unique = false)
private Long uid;
private Date createDate;
private Date updateDate;
...
}
@NodeEntity
public class Commentable extends BaseEntity {
...
}
@NodeEntity
public class Decision extends Commentable {
private String name;
}
Will this uid
index be used when I'm looking for example for (d:Decision) WHERE d.uid = {uid}
?
PROFILE resuls - internal ID vs indexed property
Query based on internal ID
PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE id(parentD) = 1474333
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199)
WHERE id(filterCharacteristic1475199) = 1475199
WITH relationshipValueRel1475199, childD
WHERE ([1, 19][0] <= relationshipValueRel1475199.value <= [1, 19][1] )
WITH childD
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358)
WHERE id(filterCharacteristic1474358) = 1474358
WITH relationshipValueRel1474358, childD
WHERE (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value ))
WITH childD
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193)
WHERE id(filterCharacteristic1475193) = 1475193
WITH relationshipValueRel1475193, childD
WHERE (ANY (id IN ['16:9', '3:2', '4:3', '1:1']
WHERE id IN relationshipValueRel1475193.value ))
WITH childD
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c)
WHERE id(c) IN [1474342, 1474343, 1474340, 1474339, 1474336, 1474352, 1474353, 1474350, 1474351, 1474348, 1474346, 1474344]
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes
ORDER BY weight DESC
SKIP 0 LIMIT 10
RETURN ru, u, childD AS decision, weight, totalVotes,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD) WHERE NOT ((ch1)<-[:DEPENDS_ON]-()) | {characteristicId: id(ch1), value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
PROFILE output:
Cypher version: CYPHER 3.1, planner: COST, runtime: INTERPRETED. 350554 total db hits in 238 ms.
Query based on indexed property uid
PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE parentD.uid = 61
MATCH (childD)-[relationshipValueRel1475199:HAS_VALUE_ON]-(filterCharacteristic1475199)
WHERE filterCharacteristic1475199.uid = 15
WITH relationshipValueRel1475199, childD
WHERE ([1, 19][0] <= relationshipValueRel1475199.value <= [1, 19][1] )
WITH childD
MATCH (childD)-[relationshipValueRel1474358:HAS_VALUE_ON]-(filterCharacteristic1474358)
WHERE filterCharacteristic1474358.uid = 10
WITH relationshipValueRel1474358, childD
WHERE (ANY (id IN ['Compact'] WHERE id IN relationshipValueRel1474358.value ))
WITH childD
MATCH (childD)-[relationshipValueRel1475193:HAS_VALUE_ON]-(filterCharacteristic1475193)
WHERE filterCharacteristic1475193.uid = 14
WITH relationshipValueRel1475193, childD
WHERE (ANY (id IN ['16:9', '3:2', '4:3', '1:1']
WHERE id IN relationshipValueRel1475193.value ))
WITH childD
OPTIONAL MATCH (childD)-[vg:HAS_VOTE_ON]->(c)
WHERE c.uid IN [26, 27, 24, 23, 20, 36, 37, 34, 35, 32, 30, 28]
WITH childD, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
WITH * MATCH (childD)-[ru:CREATED_BY]->(u:User)
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes
ORDER BY weight DESC
SKIP 0 LIMIT 10
RETURN ru, u, childD AS decision, weight, totalVotes,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) | {entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments)} ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1)<-[vg1:HAS_VOTE_ON]-(childD) | {criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes)} ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[v1:HAS_VALUE_ON]-(childD) WHERE NOT ((ch1)<-[:DEPENDS_ON]-()) | {characteristicId: id(ch1), value: v1.value, totalHistoryValues: toInt(v1.totalHistoryValues), description: v1.description, valueType: ch1.valueType, visualMode: ch1.visualMode} ] AS valuedCharacteristics
Cypher version: CYPHER 3.1, planner: COST, runtime: INTERPRETED. 671326 total db hits in 426 ms.
Is there any chance to improve the performance based on uid ?
Upvotes: 2
Views: 105
Reputation: 15076
You are right not to use Neo4j internal ids in web urls, as they can be reused after node is deleted etc..
From performance point of view the internal id is as fast as you can get - it is actually an offset in file with node/relationship records (you could have noticed these are 2 separate id sequences, you can have node with id=z and relationship with same id=x).
Any use of an index has to be slower, because the database does index lookup first, gets the internal id and then reads the node record.
However for vast majority of the applications the difference in performance is negligible - will be likely an much smaller than network latency or general OGM overhead.
If you see a noticeable difference
:schema
in Neo4j browser)info
level for org.neo4j.ogm
)PROFILE
to check the query plan UPDATED
Yes, index will be used for queries like:
MATCH (d:Decision) WHERE d.uid = {uid} ...
which should get generated by
session.load(Decision.class, uid)
if your index is primary or findByUid
on DecisionRepository
.
Beware that the index might not be used when the where clause appears in the middle of the query:
...
WITH x
MATCH (x)-[...]-(d) WHERE d.uid = {uid} ...
This depends on the query plan and you should use PROFILE
to investigate this.
Upvotes: 5