tigrou83
tigrou83

Reputation: 157

Neo4j data modeling: private owned nodes, rich relationships, locks

Versions used: Neo4j 3.0.6 with Spring-data-neo4j 4.2.0.M1 for POJO mapping

I'm trying to choose how to model data with neo4j and compare benefits/drawbacks of different solutions.

Requirements:

Movie metadata example:

Movie metadata
  locale 'en_GB':
    title: 'Jurassic Park'
    description: 'description in english'
  locale 'fr_FR':
    description: 'description en francais'
  locale 'none':
    actor: 'Jeff Goldblum'

enter image description here Solution A

Solution B

Does someone has experience about solution B ? How bad is it to need to lock a node that will be shared by million of other nodes ? What is the impact on performances and scalability ?

Does someone has a better modeling solution ?

Upvotes: 0

Views: 285

Answers (2)

Tore Eschliman
Tore Eschliman

Reputation: 2507

tl,dr: go with approach A. Don't bother with orphaned :Locale nodes except for periodic cleanup, they will have no effect on query performance.

Your approach 'A' is by far the better solution. You do need to move that data off of the :Movie node, you are correct, because it will have to be either a nested Map or a list of Maps, neither of which is supported by Node properties. For storage, you could convert these to a Map of lists, but that will be very difficult to query, much less query quickly. Your concern about "orphaned" nodes is insubstantial; it will affect query performance and data size trivially if at all, and is incredibly easy to clean up periodically to ease your mind in any case.

MATCH (x:Locale) WHERE NOT (x) <- [:METADATA] - () DETACH DELETE x

Do that once a month, or never even, it really won't affect you much. Your query is already constrained by the rest of the path, so unless orphaned :Locale nodes are going to outnumber attached ones substantially, you're only adding a small percentage to what is already likely the largest set in your query, which will also be dropped by query operation on the first pass.

As for locking, it will only affect write queries anyway, and only while a write transaction is open. You can run a million read-only queries while the write is going on and nothing will be affected. Despite that, the second model is susceptible to slow query performance, because as mentioned above, you can't put indexes on relationship properties.

Upvotes: 3

cybersam
cybersam

Reputation: 66999

You can just store the "metadata" directly as properties of each Movie node (without resorting to key and value). This is the simplest approach, which avoids locking concerns and minimizes the number nodes and relationships required. You can freely add more properties to a node at any time. This approach would also allow you to add indexes for specific Movie properties that you need to access quickly when kicking off your queries.

For example:

CREATE (m:Movie {id: 123, title: 'Men in black', director: 'Barry Sonnenfeld'});

[UPDATE]

If you need to keep your "metadata" cleanly separated from your "data" and you also need to be able to localize the metadata (including the specification of a locale property), then you can associate each Movie node with a single Metadata node for each locale. A Metadata node would directly contain all the metadata properties for a single locale for a specific Movie node.

Cypher can be used to perform "cascading deletes". For example:

MATCH (m:Movie {id: 123})
OPTIONAL MATCH p=(m)-->()
DELETE p;

Upvotes: 1

Related Questions