FishingIsLife
FishingIsLife

Reputation: 2372

Mongodb best practice to store data

I've read some mongo documentation but I wasn't able to find an answer to my question. I'm developing an application where I want to store Json documents. I've read about indexes and so on but one question is remaining for me. The data I want to store contains information that does not need to be loaded by the client as a whole. So I planed to normalize the data and split my big json into smaller ones and offer them by a seperate rest endpoint. Not I was thinking about creating a different collection for each group of jsons. The reason for that is that I want to reduce the search space compared to the option to store everything in one collection. So each user will have 5 collections and I expect 1 million users. Is this a good solution in point of performance and scaling? Is querying multiple collections more expensive then querying one?

Upvotes: 1

Views: 1444

Answers (1)

Jayshree Rathod
Jayshree Rathod

Reputation: 75

Recently while working on a project, I and my team faced this situation where we had a huge data set and in the future, it is supposed to increase rapidly.

We had MongoDB in place as data grew the performance started to degrade. The reason was mainly due to multiple collections, we have to have the lookup to join the collections and get the data.

Interestingly the way we map the two collections plays a very important role in the performance.

We had an initial structure as : Collection A { "_id" : ..., "info" : [ // list of object id of other collection ] }

Field info was used to map with "_id" of Collection B.

Since mongo have _id as a unique identifier, no matter what indexes we have, it will scan all documents of Collection B and if B is of GBS or TBS, it will take very long to get even one matching the document.

So the change we made as : Removed array of objects id from Collection A and added new field in Collection B which will have _id of a document in Collection A. Long story short, we reversed the mapping we had.

Now apply the index on Collection B's fields used in the query. This improved the performance a lot.

So it's not a bad idea to have multiple collections, executing proper mapping between collections, MongoDB can provide excellent performance. You can also use sharding to further enhance it.

Upvotes: 4

Related Questions