RavenDB data model efficient scalability design choices

Question

I'm using RavenDB on a project that is currently in development so has no users yet. My background has always been relational databases until this project but I much prefer the NoSQL approach in general. However, I don't yet have any experience of working on or managing a site built atop a NoSQL database that gets heavy traffic. I'm starting to get an understanding of Map/Reduce indexes and have included some within my solution but am wondering:

Are there any design rules of thumb that I should be following about when to create Map/Reduce indexes and when not to?

I know that it is very dependent upon the business objects I have in my system and how they interact with each other. I guess I'm just struggling to see the big picture about which queries I might be making that should use an index, and which I can simply query the objects for directly.

Here's a quick overview of parts of my business domain and where I've created indexes already:

My system consists primarily of brands and consumers. Each of those has many social media accounts. When a user signs in via their social media account, I have indexes, BrandsBySocialAccount and ConsumersBySocialAccount, which flatten those collections and associate them with the UserId of the brand or consumer. Once I have the UserId I can then retrieve the relevant brand or consumer record and away I go.

A brand can create many campaigns. I have another index here, CampaignsByBrand. There's also a requirement for tracking how consumers interact with campaigns, so campaigns can have many tracking entries for the different interactions they can perform with a campaign. They can follow a link to a campaign page externally or discover one from within the site itself for example. As I explain this it seems clear that I need indexes here. Either I have an index per interaction (ClickLinkTrackingEntriesByCampaign and ViewDetailsTrackingEntriesByCampaign) or one index (TrackingEntriesByCampaign) that contains the interaction. Is multiple indexes overkill here? It may be. There are currently 4 types of interaction and there may be others introduced later. These queries are very quick when I have a few records. But will they still be as quick as they can be when there are hundreds of thousands or even millions of records?

Looking at the overall design, it seems that, for every object that has a collection property that might need to be queried by a property on that collection, I should create Map/Reduce indexes. Is that a good rule of thumb to follow? Are there others - "if you have these types of object interactions you should be thinking about creating these kind of indexes"

Matt Johnson-Pint · Accepted Answer

First, be sure you review the documentation on static indexes if you haven't already.

The main points you need to keep clear are:

Retrieving a document directly from the document store does not require an index, and should be used whenever possible. This is done using any of the following:
- session.Load()
- session.Advanced.LoadStartingWith()
- documentStore.DatabaseCommands.Get()
Any time you query using session.Query() or session.Advanced.LuceneQuery(), you are always using an index. If you don't specify a static index index, then a dynamic index is created for you. In many cases, the delay involved in creating a dynamic index is less than desirable - so it is usually a good idea to replace dynamic indexes with static ones.
The more indexes you have, the more work the server must do, and the more storage you will consume. Therefore, you will want to consolidate indexes whenever possible. Quite often, the same index can be used for multiple purposes. You should craft your indexes carefully - don't make them too narrow to be useful, and don't make them to broad and expensive.

Say I have an object that I need to query by field A sometimes, and by field B other times. Sure, I could create two different indexes, but this would be wasteful. It would be much more efficient to have a single index that maps both A and B fields. Now the two different queries can be served by the same index. I urge you to consolidate your indexes whenever possible.

A typical bad example would be to map every field in your document and turn field storage on for all fields, just because you think you might want to project them from the index at some point. In most cases, you don't need to go this far. There are a few places where this is appropriate, but you would want to do it very sparingly.
All indexes have a Map, but we don't call them "map/reduce" indexes until they also have a Reduce section. Most indexes you will create will not be map/reduce indexes.

Map/Reduce indexes are almost always reserved for some type of aggregate calculation. For example, you might have a m/r index for SocialAccountsCountByBrand in your domain, or in a sales domain you might have something more complex like TopCustomersByTotalSalesPerMonth.
I don't agree with your assessment that if an object has a collection property that it needs an index over that collection. In many cases, you will have similar data elsewhere in your domain that can serve the same purpose. The specifics, of course, are different depending on what you want to do. But in general, if you find you are creating lots of these indexes - you might be better served by refactoring that data into its own document.

For example, What if I had a class like the following:

(intentionally bad example - don't really do this)
```
public class Customer
{
    public string Id { get; set; }
    public string Name { get; set; }
    public List Orders { get; set; }
}
```
Clearly if every order is embedded in the Customer object, I would be querying into that collection quite frequently. I would be much better served by putting each Order into its own document, referring back to the customer by a CustomerId reference.
Lastly, try to avoid thinking about indexes based on what you want the shape of the results to be. Instead, think of them based on what you want to query by. In other words, what fields will you want to specify in your Where, OrderBy, or Search clauses in your queries?

Sure, there are techniques such as live projections and TransformResults - but again, these should be used sparingly. One could argue against almost every need for transformation, now that we have more powerful features like indexing related documents. Some minor index projections can be useful, but often you can just manipulate the results in your own code and keep raven out of it. Use projections only when you actually need the data from the index in your results. If all the data you need is in the document, then there's no need to project.

I bring this point up because I have seen many cases of people designing their indexes based on the ViewModels in their UI. This is bad, as it places a requirement that the indexes be crafted for UI concerns. One should instead be thinking about the shape of the result itself. If it has all of the information to answer the query, then it could be used in a multitude of ways - including, but not limited to, the UI.

I hope this answers your questions. If you have others, respond in comments. Thanks.

RavenDB data model efficient scalability design choices

Answers (1)

Related Questions