Neo4j/Cypher _LOCK_ good or bad?

Question

Suppose you have (:User) node which has SubscribersCount property.

Each time someone subscribes/unsubscribes from user User.SubscribersCount should be updated accordingly. [:SUBSCRIBED] relation will be create/deleted on such action as well.

In this case in order to update counter you can:

calculate all incoming [:SUBSCRIBED] relationships in realtime
acquire write lock to (:User) node and increment/decrement counter

First approach will degrade as user subscribers number grows.

What about second approach? What downsides can be?

cybersam · Accepted Answer

[UPDATED (thrice)]

To start the discussion, here are example Cypher queries for your 2 options.

Calculate the relationship count as needed (without a SubscribersCount property):

(a) ADD a relationship:
```
    MATCH (u:User {id:1234}), (v:User {id: 5678})
    CREATE (u)<-[:SUBSCRIBED]-(v);
```
(b) GET the count using SIZE [according to @NicoleWhite, this should execute in constant time as long as the pattern used does not specify a label for the subscriber's node and u is already cached]:
```
    MATCH (u:User {id:1234})
    RETURN SIZE((u)<-[:SUBSCRIBED]-());
```
(c) (deprecated) GET the count using COUNT [neo4j has to iterate through all relationships (of all types) for that user]:
```
    MATCH (u:User {id:1234})
    RETURN COUNT((u)<-[:SUBSCRIBED]-());
```

Maintain a SubscribersCount property on each User:

(a) ADD a relationship [same as above, but with an additional SET]:

    MATCH (u:User {id:1234}), (v:User {id: 5678})
    CREATE (u)<-[:SUBSCRIBED]-(v)
    SET u.SubscribersCount = u.SubscribersCount + 1;

(b) GET the count [constant time complexity]:

    MATCH (u:User {id:1234})
    RETURN u.SubscribersCount;

Conclusion

Assuming that it is true that option 1b does perform in constant time once the u node is cached, then you should probably always use option 1a to add a SUBSCRIBED relationship and 1b to get the count of such relationships. Maintaining your own count would probably be slower.

However, as @drgraduss reminds us, if you need to filter the relationship by properties or use labels, then option 1b will not run in constant time.

Some examples:

SIZE(()-[:SUBSCRIBED {prop:val}]->(u))
SIZE((:label)-[:SUBSCRIBED]->(u))

In this case, option 2 may be better, since 2a and 2b run in constant time.

Neo4j/Cypher _LOCK_ good or bad?

Answers (1)

Conclusion

Related Questions