dani
dani

Reputation: 5000

Increase performance for this MongoDB query

I have a MongoDB document with quite a large embedded array:

name : "my-dataset"
data : [ 
          {country : "A", province: "B", year : 1990, value: 200}
          ... 150 000 more 
       ]

Let us say I want to return data objects where country == "A".

  1. What is the proper way of doing this, for example via NodeJs?

  2. Given 150 000 entries with 200 matches, how long should the query take approximately?

  3. Would it be better (performance/structure wise) to store data as documents and the name as a property of each document?

  4. Would it be more efficient to use Mysql for this? )

Upvotes: 0

Views: 709

Answers (2)

ploutch
ploutch

Reputation: 1214

1.You can use the MongoDB aggregation :

db.collection.aggregate([
  {$match: {name: "my-dataset"}},
  {$unwind: "$data"},
  {$match: {"data.country": "A"}}
])

Will return a document for each data entry where the country is "A". If you want to regroup the datasets, add a $group stage :

db.collection.aggregate([
  {$match: {name: "my-dataset"}},
  {$unwind: "$data"},
  {$match: {"data.country": "A"}},
  {$group: {_id: "$_id", data: {$addToSet: "$data"}}}
])

(Didn't test it on a proper dataset, so it might be bugged)

2.150000 Subdocuments is still not a lot for mongodb, so if you're only querying on one dataset it should be pretty fast (the order of the millisecond).

3.As long as you are sure that your document is going to be smaller than 16MB (kinda hard to say), the maximum BSON document size), it should be fine, but the queries would be simpler if you stored your data as documents with the dataset name as a property, which is generally better for performances.

Upvotes: 1

Mark_H
Mark_H

Reputation: 790

A) Just find them with a query.

B) If the compound index {name:1, data.country:1} is built, the query should be fast. But you store all the data in one array, $unwind op has to be used. As a result, the query could be slow.

C) It will be better. If you store the data like:

{country : "A", province: "B", year : 1990, value: 200, name:"my-dataset"}
{country : "B", province: "B", year : 1990, value: 200, name:"my-dataset"}
...

With compound index {name:1, country:1}, the query time should be < 10ms.

D) MySQL vs MongoDB 1000 reads

Upvotes: 1

Related Questions