Joe
Joe

Reputation: 4234

Mongoose find() slow on large documents

This query never finishes:

   const stocks = await mongoose.model("stock").find().exec();
    console.log(stocks.length);

This query is executed in < 1 second

   const stocks = await mongoose.model("stock").find().select("ticker").exec();
    console.log(stocks.length);

I have a lot of data on each stock (10 years of stockdata).

What is mongoose doing? Some validation on each find? Some setting I can use to turn it off?

Or is the only option to use mongodb native?

Update:

Ok So I tried to do it mongo native instead:

console.log("start");
    const MongoClient = mongodb.MongoClient;
    await MongoClient.connect(connectionMongoUrl, (err, db) => {
      if (err != null) {
        console.log(err);
        return Promise.reject(err);
      }
      const dbo = db.db("tradestats");
      dbo
        .collection("stocks")
        .find({})
        .toArray(async function (err, stocks) {
          console.log(stocks.length); // never fires

Same problem! It never finishes. So it's not mongoose then. What can it be? Some memory settings in nodejs or something?

Update2:

Is it bad data model design? Wrong database? Should 10 years of stock prices be put in a separate collection and use reference instead?

Upvotes: 4

Views: 1454

Answers (1)

Citrullin
Citrullin

Reputation: 2321

What is mongoose doing? Some validation on each find? Some setting I can use to turn it off?

In short: It does a lot different things. There a good article about the details of find();

10 years of stock data is a lot.I don't think you use the right database for your use-case. MongoDB is not design for these kind of use-cases. It is a document oriented database and you should treat it like this. I highly recommend column based databases for this kind of use-cases. Cassandra might be a good choice. Another, more complicated solution, might be Apache ORC on the hadoop file system, if you need it to scale.

You can also just try to increase the timeout settings, so that it doesn't timeout. It will not improve the performance, but at least your query will not fail. MongoDB needs to iterate over all these documents and get the desired information. Without indexes that can be a lot which leads to a poor performance. So, adding indexes might help. Another performance improvement in case of MongoDB might be to use the Aggregation Pipeline or Map Reduce. To limit the memory usage.

Upvotes: 1

Related Questions