Reputation: 17072
In a mongo database, I have something like 4 millions documents (each document consist of a timestamp and a value).
I have a use case where I need to be able to query all the documents through a rest api. I made sevreral tests with sailsjs using sails-mongo or directly node-mongodb-native in a controller but none of those 2 solutions are working, the process hangs and never goes back.
If I use directly the mongo shell I can query the whole collection (of course it's a little bit long but that's a lot of data).
1st Case: from mongo shell
var v= db.data.find()
v.length() => returns 4280183 in something like 30 sec
In mongodb.log I can see all the 'getmore' lines with the number of items returned
2nd case: from my sails controller (using node-mongodb-native)
// TEST WITH MONGODB NATIVE
native_find: function(req, res){
var MongoClient = require('mongodb').MongoClient;
var url = 'mongodb://localhost:27017/jupiter';
MongoClient.connect(url, function(err, db) {
console.log("Connected correctly to server");
var collection = db.collection('data');
// Find all data
collection.find({}).toArray(function(err, d) {
db.close();
res.json(d);
});
});
}
The process is triggered, mongo seems to work but after a while I have the following error:
$ curl 'http://192.168.1.143:8000/native_find'
curl: (52) Empty reply from server
If I check in the mongo log, I can see some getmore but there are not enough to query the whole collection.
3rd case: from sails controller through sails-mongo ORM
// TEST WITH SAILS-MONGO
sailsmongo_find: function(req, res){
Data.find().exec(function(err, d){
return res.json(d);
});
}
It seems once the results are retreived from mongo, several loops (map in rewriteIDs and call to toJSON method) on the whole results (4.000.000 times...) are taking a loooot of time and causes the process to hang forever...
Any idea on how to have node / mongo working for query on this large amount of data ?
Upvotes: 1
Views: 937
Reputation: 487
This is a lot of data to retrieve in a single operation, try to get the data in async way, in a batch process.
Maybe you can limit to 100.000 o 200.000 results and save them into an array, then you can ask for more results.
You could use async to achieve that.
Upvotes: 0