zevij
zevij

Reputation: 2446

mongodb aggregate with extra info

I have a mongo collection containing docs such as this:

{
    "_id" : ObjectId("57697321c22d3917acd66513"),
    "parent" : "AlphaNumericID",
    "signature" : "AnotherAlphaNumericID",
    "price" : 1638,
    "url" : "http://www.thecompany.com/path/to/page1",
    "date" : ISODate("2016-06-21T17:02:20.352Z"),
    "valid" : true
}

What I am trying to do is to run one query that would group on signature filed, return min and max price AND corresponding url:

{
        "signature" : "AnotherAlphaNumericID",  
        "min_price" : 1504,
        "min_rent_listing" : "http://www.thecompany.com/path/to/page1",
        "max_price" : 1737,
        "max_price_listing" : "http://www.thecompany.com/path/to/page2",
}

Running a $group on $signature field to obtain $min and $max is straight forward but in order to get the actual urls I split the query into 2 with the first query returning a sorted list of docs using $signature with prices from min to max and then (in python code) taking the first and last element. This works fine but would be nice to have one query.

Thoughts?

p.s.

Also 'toyed' with running one query for min and one for max and 'zipping' the results.

Upvotes: 0

Views: 95

Answers (1)

Saleem
Saleem

Reputation: 8978

You can play a trick with help of $group and $project. Assuming dataset is

{ 
    "_id" : ObjectId("57db28dc705af235a826873a"), 
    "parent" : "AlphaNumericID", 
    "signature" : "AnotherAlphaNumericID", 
    "price" : 1638.0, 
    "url" : "http://www.thecompany.com/path/to/page1", 
    "date" : ISODate("2016-06-21T17:02:20.352+0000"), 
    "valid" : true
}
{ 
    "_id" : ObjectId("57db28dc705af235a826873b"), 
    "parent" : "AlphaNumericID", 
    "signature" : "AnotherAlphaNumericID", 
    "price" : 168.0, 
    "url" : "http://www.thecompany.com/path/to/page2", 
    "date" : ISODate("2016-06-21T17:02:20.352+0000"), 
    "valid" : true
}
{ 
    "_id" : ObjectId("57db28dc705af235a826873c"), 
    "parent" : "AlphaNumericID", 
    "signature" : "AnotherAlphaNumericID", 
    "price" : 163.0, 
    "url" : "http://www.thecompany.com/path/to/page3", 
    "date" : ISODate("2016-06-21T17:02:20.352+0000"), 
    "valid" : true
}
{ 
    "_id" : ObjectId("57db28dc705af235a826873d"), 
    "parent" : "AlphaNumericID", 
    "signature" : "AnotherAlphaNumericID", 
    "price" : 1680.0, 
    "url" : "http://www.thecompany.com/path/to/page4", 
    "date" : ISODate("2016-06-21T17:02:20.352+0000"), 
    "valid" : true
}

Try following query in shell

db.collection.aggregate([
   {$sort:{price:1}},
   {$group:{
       _id:"$signature", 
       _first:{$first:"$url"},
       _last:{$last:"$url"}, 
       _min:{$first:"$price"}, 
       _max:{$last:"$price"}}
   },
   {$project:{
     _id:0, 
     min:{
       url:"$_first", 
       price:"$_min"}, 
     max:{
       url:"$_last", 
       price:"$_max"}}
   }
])

Output will be with minimum/maximum price and corresponding url

{ 
    "min" : {
        "url" : "http://www.thecompany.com/path/to/page3", 
        "price" : 163.0
    }, 
    "max" : {
        "url" : "http://www.thecompany.com/path/to/page4", 
        "price" : 1680.0
    }
}

What I changed from original answer: _min:{$min:"$price"}, --> to use $first _max:{$max:"$price"}} --> to use $last

Reason: we go into the pipeline with an ascending sort on price. By default, first record is min and last record is max.

Upvotes: 2

Related Questions