ELP24
ELP24

Reputation: 61

Find documents in MongoDB with non-typical limit

I have a problem, but don't have idea how to resolve it. I've got PointValues collection in MongoDB. PointValue schema has 3 parameters:

  • dataPoint (ref to DataPoint schema)
  • value (Number)
  • time (Date)

There is one pointValue for every hour (24 per day).

I have API method to get PointValues for specified DataPoint and time range. Problem is I need to limit it to max 1000 points. Typical limit(1000) method isn't good way, because I need point for whole, specified time range, with time step depends on specified time range and point values count.

So... for example:

Request data for 1 year = 1 * 365 * 24 = 8760 It should return 1000 values but approx 1 value per (24 / (1000 / 365)) = ~9 hours

I don't have idea what method i should use to filter that data in MongoDB.

Thanks for help.

Upvotes: 1

Views: 51

Answers (1)

Philipp
Philipp

Reputation: 69663

Sampling exactly like that on the database would be quite hard to do and likely not very performant. But an option which gives you a similar result would be to use an aggregation pipeline which $group's the $first best value by $year, $dayOfYear, and $hour (and $minute and $second if you need smaller intervals). That way you can sample values by time steps, but your choices of step lengths are limited to what you have date-operators for. So "hourly" samples is easy, but "9-hourly" samples gets complicated. When this query is performance-critical and frequent, you might want to consider to create additional collections with daily, hourly, minutely etc. DataPoints so you don't need to perform that aggregation on every request.

But your documents are quite lightweight due to the actual payload being in a different collection. So you might consider to get all the results in the requested time range and then do the skipping on the application layer. You might want to consider combining this with the above described aggregation to pre-reduce the dataset. So you could first use an aggregation-pipeline to get hourly results into the application and then skip through the result set in steps of 9 documents. Whether or not this makes sense depends on how many documents you expect.

Also remember to create a sorted index on the time-field.

Upvotes: 1

Related Questions