Reputation: 4795
Indexes in MongoDB should be kept at a minimum. Or so I try after reading this. ..and I also fiddled with this nice preso with some great tips.
I have a question and let me devise an example:
I need to store data about goods stored in locations which are indexed by catalog.
I have several locations. In each location I can have many goods. Each good can either be in stock or not. Every triplet is found in a given catalogs.
CATALOG_X: { { location: "...", good: "...", stock_or_not: 0|1 },
{ location: "...", good: "...", stock_or_not: 0|1 },
...
}
e.g.:
CATALOG1: NEWYORK - BREAD - 0 // 0 is out of stock
CATALOG2: LOSANGELES - WATER - 1 // 1 is in stock
One complication: I obtain the status of these by a query which is only by catalog. Like:
getMeStatusOfCatalog("CATALOG1") -> { "NEWYORK - BREAD" : 0 } // location - good : instock
However: important point, getMeStatusOfCatalog
may just stop giving me an answer of a particular good, so If I don't see BREAD
anymore, I still need to set to 0
the BREAD
for that particular catalog. This means that in my "write" operation I need to get all the past "active" goods based on a catalog (below in_catalog
).
My main business objective in my application is that I need to display the breakdown of each location, like:
Location 1 (New York) has: bread which is out of stock, etc
Location 2 (Los Angeles) has: water which is fine, etc
if I set up my _id
as "LOCATION:GOOD"
then I can query as /^LOCATION/
and get all the things I need*. Overall:
{
_id: "LOCATION:GOOD"
catalog: "..."
in_stock: 0|1
in_catalog: 0|1
}
However, I am troubled by data insertion/update based on the getMeStatusOfCatalog
. How will I fetch all the documents which need to be updated?
I can't add it to the _id
discussed before since I won't be able to use left-anchored-regex anymore. Do I need to make catalog
an index (compound, unique)? Like:
{
_id: "LOCATION:GOOD:CATALOG" // unique by itself, repeating catalog to ensure unique, used for fast reads
_catalog: "..." // unique by itself, used for fast writes
in_stock: 0|1
in_catalog: 0|1
}
createIndex( { "_id": 1, "_catalog": 1 }, { unique: true } )
In this way I sacrifice space (double index) to have faster queries.
How would you design this as to have both performant reads and writes?
Upvotes: 3
Views: 122
Reputation: 11190
There is always a trade-off between insertion and query performance. As you add indexes to optimize your queries, you will impact insertion rates.
However, unless your use case is a web scale transaction or event processing system (thousands of transactions per second), you are unlikely to notice insertion performance issues due to one or two compound indexes. You may be making your solution more complex in order to solve a problem you don't have.
In fact, your proposed solution is inadvertently making an insertion problem more likely!
It is true that if possible, you should use the _id field. However, as implemented, the default _id field has two important properties:
When MongoDB performs an insert or an update, it places the document on disk based on the value of _id (this behavior varies based on the storage engine - MMapv1 being especially susceptible). In your implementation, you are losing this second property, so it is entirely possible that, at scale, a single insert or update may force MongoDB to reorganize your data on disk, which will be extremely slow.
If you carefully read the presentation you reference, you'll see that their choice of a key for _id maintains both these properties!
For your application, I would leave _id alone and instead include catalog, location, and good as document properties in a compound index. Yes, it will marginally increase storage requirements and imperceptibly impact insertion performance, but your data set will be very large before you notice a performance issue.
I also think you are artificially impacting your query performance by using a regex search. While the feature is there, MongoDB isn't built to be a full-text-search engine. If you need something like that (Google-like queries), then consider using ElasticSearch or related technologies.
Finally, should you indeed need web scale insertion performance while maintaining query performance, you might find yourself making trade-offs elsewhere. For instance, you could insert or update the documents with no indexes, and then use the oplog to keep a "read-only" indexed version in sync. There are some really innovative architectures that marry MongoDB and ElasticSearch in this way that provide excellent performance (For instance: http://blog.mpayetta.com/elasticsearch/mongodb/2016/08/04/full-text-indexing-with-elastic-search-and-mongodb/
Upvotes: 3