Reputation: 4304
I'm working on porting a database to MongoDB and have run into some problems with the document size limit.
My understanding is that if you're going to always view one entity in the context of another entity, that embedding is the way to go.
However the data (genomic) has so many entities of each type, that even just storing the _id field in the embedded document puts me over the 16 MB size limit:
Genome
{
...
has_reactions:[id1, id2, ... idn] // Where n is really large
}
I've also tried modelling it the other way, but hit the same limitation:
Reaction
{
...
in_genomes:[id1, id2, ... idn] // Still really large
}
The MongoDB documentation gives great examples for one-to-one, and one-to-many relations, but doesn't have much to say on many-to-many.
In traditional SQL, I'd model this with a Genome
, Reaction
, and GenomeReaction
set of tables. Is that the only way to go here as well?
Edit:
As for the background, reaction is a metabolic reaction, though it doesn't really matter what genomes and reactions mean in this context. It could just as well be a relationship between the types of gaskets in each of my widgets. It's a standard many-to-many relationship where both instances of "many" can be a very large number.
I'm aware that Mongo doesn't allow joins, but that's easily solved with using multiple queries, which is the recommended way of handling document references in Mongo.
We haven't chosen Mongo as a solution, we're just evaluating it as a possible solution. It looked attractive because it is billed as being able to handle "huMONGOus datasets", so I was a bit surprised by this limitation.
In all of our other use cases, Mongo has worked well. It's just this particular relationship that I'm unable to port from mysql to mongo without using a Genome
, Reaction
, and GenomeReaction
set of collections. I can easily do this, but I was hoping that there was a more mongoy way to handle it.
Perhaps mongo doesn't handle many-to-many relationships well, which would explain its conspicuous absence from the list of data model scenarios in its docs.
Upvotes: 2
Views: 564
Reputation: 4304
After asking about this on the official mongo-db mailing list, I discovered that the recommended way to handle scenarios like this is to either use the three collection mapping I mentioned in my original post, or to use a "hybrid schema design" where one of the collections is stored in buckets.
So you'd have something like:
// genomes collection
{
_id: 1,
genome_thingee: 'blah blah'
...
}
// reaction_buckets collection
{
_id: ObjectId(...),
genome_id: 1,
count: 100,
reactions: [
{ reaction-key1: value, reaction-key 2: value},
{ reaction-key1: value, reaction-key 2: value},
{ reaction-key1: value, reaction-key 2: value},
{ reaction-key1: value, reaction-key 2: value},
{ reaction-key1: value, reaction-key 2: value},
...]
}
As you might imagine, there are all kinds of implications to this kind of model that your application has to take into account when adding or querying data.
While in the end this approach doesn't really appeal to me (and thus I decided to look at Neo4j at @Philipp's suggestion), I thought I'd post the solution in case anyone else needs to solve a similar problem and doesn't mind the hybrid/bucket approach.
Upvotes: 2