JoeSlav
JoeSlav

Reputation: 4815

MongoDB performances - how many databases, collections?

I am looking to use MongoDB to store time-series data. For sake of discussion imagine I have a finite numbers of sensors deployed (e.g. 10-100-1000 sensors). Each sensors has a dozen of "metrics" (e.g. temp, humidity, etc) which are collected every minute and then stored. There is a front end which then displays charts for each sensors or aggregate on selected intervals.

What is the best approach, performance wise, to store this? Specifically:

Thanks a lot

Upvotes: 3

Views: 455

Answers (1)

Atish
Atish

Reputation: 4425

Approach 1(A): Creating a single database for everything. (With single collection)

Pros:

  • Less maintenance: Backup, creating database users, restore etc

Cons:

  • You may see database level lock for creating indexes on large database
  • To perform operations on specific sensor data, you need to add additional indexes to fetch only sensor specific collection
  • You're bound to create not more than 64 indexes on a single collection. Although sounds bad indexing strategy.

Approach 1(B): Creating a single database for everything. (With 1 collection for each sensor)

Pros:

  • Less maintenance: Backup, creating database users, restore etc
  • Minimizes the need for creating indexes to identify sensor specific data from entire monolithic collection
  • Every sensor specific query will be only targeted on a specific collection. Does not require to pull large working set into memory as compared to a single large collection.
  • Building index on relatively smaller collection is more feasible than that of the large collection in single DB

Cons:

  • You may end up creating too many indexes. (Sum of total number of indexes on all collections).
  • More maintenance is required for a large number of indexes.
  • WiredTiger creates 1 file for a collection and 1 for index internally. If your use case grows with a large number of sensors. You may end up using 64K open file limit.

Performance-wise, does it matters if I partition the data by each sensor or by metrics?

  • This depends on the access patterns expected from your analytics app.

Performance-wise, should i make a collection just for the sensors info and then collections for data or just merge the two in the same collection?

  • Creating a collection for sensor metadata and sensor data may be needful. It will minimize duplicating sensor metadata in each and every collected sensor data.

  • You may like to read Williams blog post here on designing this pattern.

As always, it's better to design a sample schema and test your queries within your test environment.

Upvotes: 4

Related Questions