Reputation: 570
I have one big mongodb collection (3-million docs, 50 GigaBytes), and it would be very slow to query the data even I have created the indexs.
db.collection.find({"C123":1, "C122":2})
e.g. the query will be timeout or will be extreme slow (10s at least), even if I have created the separate indexes for C123
and C122
.
Should I create more indexs or increase the physical memory to accelerate the querying?
Upvotes: 6
Views: 22899
Reputation: 883
Learned it hard way..
Upvotes: 2
Reputation: 51
Use skip
and limit
. Run a loop for 50000 data at once .
https://docs.mongodb.com/manual/reference/method/cursor.skip/
https://docs.mongodb.com/manual/reference/method/cursor.limit/ example :
[
{
$group: {
_id: "$myDoc,homepage_domain",
count: {$sum: 1},
entry: {
$push: {
location_city: "$myDoc.location_city",
homepage_domain: "$myDoc.homepage_domain",
country: "$myDoc.country",
employee_linkedin: "$myDoc.employee_linkedin",
linkedin_url: "$myDoc.inkedin_url",
homepage_url: "$myDoc.homepage_url",
industry: "$myDoc.industry",
read_at: "$myDoc.read_at"
}
}
}
}, {
$limit : 50000
}, {
$skip: 50000
}
],
{
allowDiskUse: true
},
print(
db.Or9.insert({
"HomepageDomain":myDoc.homepage_domain,
"location_city":myDoc.location_city
})
)
Upvotes: 1
Reputation: 1199
For such a query you should create compound indexes. One on both fields. And then it should be very efficient. Creating separate indexes won't help you much, because MongoDB engine will use first to get results of first part of query, but second if is used won't help much (or even can slow down in some cases your query because of lookup in indexes table and then in real data again). You can confirm used indexes by using .explain() on your query in shell.
See compound indexes:
https://docs.mongodb.com/manual/core/index-compound/
Also consider sorting directions on both your fields while making indexes.
Upvotes: 9
Reputation: 230286
The answer is really simple.
You don't need to create more indexes, you need to create the right indexes. Index on field c124
won't help queries on field c123
, so no point in creating it.
Use better/more hardware. More RAM, more machines (sharding).
Upvotes: 2