MongoDB query joining two collections

Question

I have 2 mongo collections:

companies: Each record is a company with many fields (city, country, etc) —> 100k rows

{company_id:1, country:"USA", city:"New York",...}
{company_id:2, country:"Spain", city:"Valencia",... }
{company_id:3, country:"France", city:"Paris",... }

scores: There are blocks of dates, and every block has a company_id + score, example —> 100k rows in each block

{date: 2016-05-29, company_id:1, score:90}
{date: 2016-05-29, company_id:2, score:87}
{date: 2016-05-29, company_id:3, score:75}
...
{date: 2016-05-22, company_id:1, score:88}
{date: 2016-05-22, company_id:2, score:87}
{date: 2016-05-22, company_id:3, score:76}
...
{date: 2016-05-15, company_id:1, score:91}
{date: 2016-05-15, company_id:2, score:82}
{date: 2016-05-15, company_id:3, score:73}
...

Objective:

I want to retrieve a list of companies that can be filtered by some fields (country, city, ...) + its newest score (in 2016-05-29),ordered by score descending

That is: filter in one collection, filter + order in the other collection

Note: There’s an index on scores.date, and we can locate/pre-calculate easily and fast the highest date (2016-05-29 in this example)

Attempts:

I’ve been trying an aggregate query using $lookup. When the filter is complete (and the number of companies small), the query is faster.

Query is as below:-

db.companies.aggregate([
{$match: {"status": "running", "country": "USA", "city": "San Francisco",
         "categories": { $in: ["Software"]}, dummy: false}},
{$lookup: {from: "scores", localField: "company_id", foreignField: "company_id", as:"scores"}},
{$unwind: "$scores"},
{$project: {_id:            "$_id",
            "company_id":   "$company_id",
            "company_name": "$company_name",
            "status":       "$status",
            "city":         "$city",
            "country":      "$country",
            "categories":   "$categories",
            "dummy":        "$dummy",
            "score":        "$scores.score",
            "date":         "$scores.date"}},
{$match: {"date" : ISODate("2016-05-29T00:00:00Z")}},
{$sort: {"score":-1}}
],{allowDiskUse: true})

But when the filter is small or empty (more companies), the $sort part takes several seconds.

db.companies.aggregate([
{$match: {"status": "running"}},
{$lookup: {from: "scores", localField: "company_id", foreignField: "company_id", as:"scores"}},
{$unwind: "$scores"},
{$project: {_id:            "$_id",
            "company_id":   "$company_id",
            "company_name": "$company_name",
            "status":       "$status",
            "city":         "$city",
            "country":      "$country",
            "categories":   "$categories",
            "dummy":        "$dummy",
            "score":        "$scores.score",
            "date":         "$scores.date"}},
{$match: {"date" : ISODate("2016-05-29T00:00:00Z")}},
{$sort: {"score":-1}}
],{allowDiskUse: true})

Probably because the number of companies that the filter find. 59 rows are easier to order than 89k

> db.companies.count({"status": "running", "country": "USA", "city": "San Francisco", "categories": { $in: ["Software"]}, dummy: false})
59
> db.companies.count({"status": "running"})
89043

I've tried a different approach, aggregating by scores, filter by date, sorting by score (the index date+score is very useful here), and everything is very fast, until the last $match when I filter company attributes

db.scores.aggregate([
{$match:{"date" : ISODate("2016-05-29T00:00:00Z")}},
{$sort:{"score":-1}},
{$lookup:{from: "companies", localField: "company_id", foreignField: "company_id", as:"companies"}},
{$unwind:"$companies"},
{$project: {_id:             "$companies._id",
            "company_id":    "$companies.company_id",
            "company_name":  "$companies.company_name",
            "status":        "$companies.status",
            "city":          "$companies.city",
            "country":       "$companies.country",
            "categories":    "$companies.categories",
            "dummy":         "$companies.dummy"}},
            "score":         "$score",
            "date":          "$date"
{$match:{"status": "running", "country":"USA", "city": "San Francisco",
         "categories": { $in: ["Software"]}, dummy: false}}
],{allowDiskUse: true})

using this approach, the large filter (the previous example) is very slow, and the small filter (just {"status": "running"}) is faster

Any way to join both collections, filter in both of them and order by one field?

MongoDB query joining two collections

Answers (1)

Related Questions