ptgamr
ptgamr

Reputation: 594

Filter large dataset base on aggregation result

I need to do sort of an "Advanced Search" functionality with MongoDB. It's a sport system, where player statistic are collected for each season like this:

{
  player: {
    id: int,
    name: string
  },
  goals: int,
  season: int
}

Uses can search data across season, for example: I want to search for player who scored > 30 goals from season 2012 - 2016.

I could use mongodb aggregation:

db.stats.aggregate( [
 { $match: { season: { $gte: 2014, $lte: 2016 } } }
 { $group: { _id: "$player", totalGoals: { $sum: "$goals" } } },
 { $match: { $totalGoals: { $gte: 30 } } },
 { $limit: 10 },
 { $skip: 0 }
] )

That's working fine, the speed is acceptable for the collections with more than 3 millions records.

However, if the user just want to search for a larger seasons range, let say: players lifetime statistic. The aggregation turns out to be very very very slow. And I understand that MongoDB has to go through all the docs and calculate the $totalGoals.

I just wonder if there is better approach that could solve this performance problem?

Upvotes: 2

Views: 144

Answers (1)

profesor79
profesor79

Reputation: 9473

  1. you can have pre-calculated data for past seasons and make two step query:

    a) get past data b) get current data

  2. you could try to optimise indexes on that query

  3. hardware: use SSD

  4. hardware: more memory

  5. introduce sharding to split load

Upvotes: 1

Related Questions