drgraduss
drgraduss

Reputation: 381

Neo4j/Cypher _LOCK_ good or bad?

Suppose you have (:User) node which has SubscribersCount property.

Each time someone subscribes/unsubscribes from user User.SubscribersCount should be updated accordingly. [:SUBSCRIBED] relation will be create/deleted on such action as well.

In this case in order to update counter you can:

First approach will degrade as user subscribers number grows.

What about second approach? What downsides can be?

Upvotes: 1

Views: 133

Answers (1)

cybersam
cybersam

Reputation: 67044

[UPDATED (thrice)]

To start the discussion, here are example Cypher queries for your 2 options.

  1. Calculate the relationship count as needed (without a SubscribersCount property):

    (a) ADD a relationship:

        MATCH (u:User {id:1234}), (v:User {id: 5678})
        CREATE (u)<-[:SUBSCRIBED]-(v);
    

    (b) GET the count using SIZE [according to @NicoleWhite, this should execute in constant time as long as the pattern used does not specify a label for the subscriber's node and u is already cached]:

        MATCH (u:User {id:1234})
        RETURN SIZE((u)<-[:SUBSCRIBED]-());
    

    (c) (deprecated) GET the count using COUNT [neo4j has to iterate through all relationships (of all types) for that user]:

        MATCH (u:User {id:1234})
        RETURN COUNT((u)<-[:SUBSCRIBED]-());
    
  2. Maintain a SubscribersCount property on each User:

    (a) ADD a relationship [same as above, but with an additional SET]:

        MATCH (u:User {id:1234}), (v:User {id: 5678})
        CREATE (u)<-[:SUBSCRIBED]-(v)
        SET u.SubscribersCount = u.SubscribersCount + 1;
    

    (b) GET the count [constant time complexity]:

        MATCH (u:User {id:1234})
        RETURN u.SubscribersCount;
    

Conclusion

Assuming that it is true that option 1b does perform in constant time once the u node is cached, then you should probably always use option 1a to add a SUBSCRIBED relationship and 1b to get the count of such relationships. Maintaining your own count would probably be slower.

However, as @drgraduss reminds us, if you need to filter the relationship by properties or use labels, then option 1b will not run in constant time.

  • Some examples:

    SIZE(()-[:SUBSCRIBED {prop:val}]->(u))
    SIZE((:label)-[:SUBSCRIBED]->(u))
    
  • In this case, option 2 may be better, since 2a and 2b run in constant time.

Upvotes: 3

Related Questions