Reputation: 29
Problem: I have a MongoDB collection TrackingReport
with 4 M+ records (expected to grow to billions) that stores user location tracking data. I need to get the latest tracking record for each user efficiently.
Current indexes:
created_at: -1
user_id: 1
Current pipeline:
[
{
$sort: { created_at: -1 }
},
{
$group: {
_id: "$user_id",
latestTracking: { $first: "$$ROOT" }
}
},
{
$project: {
_id: 0,
id: "$_id",
name: "$latestTracking.user.name",
code: "$latestTracking.user.code",
phone: "$latestTracking.user.phone",
propic_url: "$latestTracking.user.propic_url",
latest_tracking_date: "$latestTracking.created_at",
latest_tracking: {
user_id: "$latestTracking.user_id",
lat: "$latestTracking.lat",
long: "$latestTracking.long",
address: "$latestTracking.address",
battery_percent: "$latestTracking.battery_percent",
created_at: "$latestTracking.created_at",
updated_at: "$latestTracking.updated_at"
}
}
},
{ $sort: { latest_tracking_date: -1 } },
{ $skip: 0 },
{ $limit: 15 }
]
Performance issue
The initial $sort
stage is causing significant performance issues. The pipeline needs to sort 4M+ records before grouping by user_id
.
Questions:
Additional context 4. Collection size: 4M+ records (growing to billions) 5. Need: Latest tracking record per user with pagination 6. Current bottleneck: Initial sort operation
Additional Google Doc For Better Understand The Problem
https://docs.google.com/document/d/1QGnPSqHFWW0T6sFO6rkH8NsHhfpsZK9Hnfjhu_6FbBQ/edit?usp=sharing
Upvotes: 0
Views: 37