Khaled
Khaled

Reputation: 43

Select Rows greater than most recent a certain value per user

I have mongo collection 'orders' contains a list of user with orderid and time that looks like this:

user    orderid     time    has_pending
10001   1       1510489123  0
10002   2       1510489125  0
10003   3       1510489127  0
10001   5       1510489131  1
10001   6       1510489133  1
10002   7       1510489135  0
10003   8       1510489137  0
10001   9       1510489139  1
10001   10      1510489141  0
10002   11      1510489143  1
10001   12      1510489145  0 <<<<< 
10002   13      1510489147  0 <<<<< 
10001   14      1510489149  1
10002   15      1510489151  1
10003   16      1510489153  1
10003   17      1510489155  1
10003   18      1510489157  1
10003   21      1510489163  1
10003   22      1510489165  0 <<<<< 

I'm trying to get the list of orders per user where order time >= time of the last appearance of has_pending = 0

for example: if we look at user 10001 data:

user    orderid time    has_pending
10001   1   1510489123  0
10001   5   1510489131  1
10001   6   1510489133  1
10001   9   1510489139  1
10001   10  1510489141  0
10001   12  1510489145  0
10001   14  1510489149  1

so the result of this query for this user will be:

10001   12  1510489145  0
10001   14  1510489149  1

The required query should fetch the data for all users, and the result should be like this:

user    orderid     time    has_pending
10001   12      1510489145  0
10002   13      1510489147  0
10001   14      1510489149  1
10002   15      1510489151  1
10003   22      1510489165  0

MYSQL QUERY:

SELECT
    t1.*
FROM
    test AS t1
LEFT JOIN test AS t2 ON t1.time >= t2.time AND t1.user = t2.user
WHERE
    t2.orderid= (SELECT max(orderid) FROM test WHERE user= t1.user AND has_pending = 0)

Any ideas how can I get the result in one mongo query?

Thanks

Upvotes: 0

Views: 140

Answers (3)

dnickless
dnickless

Reputation: 10918

Given the following input documents:

{ "user" : 10001, "orderid" : 1, "time" : 1510489123, "has_pending" : 0 }
{ "user" : 10002, "orderid" : 2, "time" : 1510489125, "has_pending" : 0 }
{ "user" : 10003, "orderid" : 3, "time" : 1510489127, "has_pending" : 0 }
{ "user" : 10001, "orderid" : 5, "time" : 1510489131, "has_pending" : 1 }
{ "user" : 10001, "orderid" : 6, "time" : 1510489133, "has_pending" : 1 }
{ "user" : 10002, "orderid" : 7, "time" : 1510489135, "has_pending" : 0 }
{ "user" : 10003, "orderid" : 8, "time" : 1510489137, "has_pending" : 0 }
{ "user" : 10001, "orderid" : 9, "time" : 1510489139, "has_pending" : 1 }
{ "user" : 10001, "orderid" : 10, "time" : 1510489141, "has_pending" : 0 }
{ "user" : 10002, "orderid" : 11, "time" : 1510489143, "has_pending" : 1 }
{ "user" : 10001, "orderid" : 12, "time" : 1510489145, "has_pending" : 0 }
{ "user" : 10002, "orderid" : 13, "time" : 1510489147, "has_pending" : 0 }
{ "user" : 10001, "orderid" : 14, "time" : 1510489149, "has_pending" : 1 }
{ "user" : 10002, "orderid" : 15, "time" : 1510489151, "has_pending" : 1 }
{ "user" : 10003, "orderid" : 16, "time" : 1510489153, "has_pending" : 1 }
{ "user" : 10003, "orderid" : 17, "time" : 1510489155, "has_pending" : 1 }
{ "user" : 10003, "orderid" : 18, "time" : 1510489157, "has_pending" : 1 }
{ "user" : 10003, "orderid" : 21, "time" : 1510489163, "has_pending" : 1 }
{ "user" : 10003, "orderid" : 22, "time" : 1510489165, "has_pending" : 0 }

Your query would need to look like this:

db.collection.aggregate([
{
    $sort: {
        "time": -1 // sort by "time" descending
    }
}, {
    $group: { // we want to slice our data per "user" so let's group by that field
        _id: "$user",
        "orders": {
            $push: "$$ROOT" // remember each document in an array per each "user" group (entries still sorted by "time" descending)
        }
    }
}, {
    $project: {
        "orders": { // our orders array shall only contain...
            $slice: [ "$orders", 0, { // ...all items from the last one up until...
                $add: [ { $indexOfArray: [ "$orders.has_pending", 0 ] }, 1 ] // ...the first appearance of an "has_pending" == 0 entry
                // the $add makes sure that we include the found element with "has_pending" == 0, too
            }]
        }
    }
}, {
    $unwind: "$orders" // restore original documents again by flattening the "orders" array
}, {
    $replaceRoot: { // move the (single) entry of the orders array to the root level of each document
        "newRoot": "$orders"
    }
}, {
    $sort: {
        "time": 1 // your example output was sorted by date so that's why we do that here, too...
    }
}])

This will give you the exact order and contents that you asked for (plus the _id field which I've omitted for brevity):

{ "user" : 10001, "orderid" : 12, "time" : 1510489145, "has_pending" : 0 }
{ "user" : 10002, "orderid" : 13, "time" : 1510489147, "has_pending" : 0 }
{ "user" : 10001, "orderid" : 14, "time" : 1510489149, "has_pending" : 1 }
{ "user" : 10002, "orderid" : 15, "time" : 1510489151, "has_pending" : 1 }
{ "user" : 10003, "orderid" : 22, "time" : 1510489165, "has_pending" : 0 }

Upvotes: 2

Alex P.
Alex P.

Reputation: 3171

Answer

db.getCollection('order').aggregate([
{ $sort: {"time": -1}},
{ 
    $group:{
        _id: {
            user: "$user", 
            has_pending: "$has_pending"
            },
        time: { $first: "$time"},
        orderid: { $first: "$orderid"}
    }
},
{
    $project: {
        _id: 0,
        user: "$_id.user",
        orderid: "$orderid",
        time: "$time",
        has_pending: "$_id.has_pending"
    }
}
])

If you want to understand what you are doing in each aggregation pipeline you can continue reading.

To explain what happens in each pipeline I'll take a subset of what you posted. So lets say we have these documents:

user    orderid     time    has_pending
10001   1       1510489123  0
10002   2       1510489125  0
10001   5       1510489131  1
10002   7       1510489135  0
10002   11      1510489143  1
10001   12      1510489145  0  
10002   13      1510489147  0 
10001   14      1510489149  1
10002   15      1510489151  1

Explain $sort result

By sorting by time { $sort: {"time": -1}} you order your results by time descending order. That would make your results look like this

user    orderid     time    has_pending
10002   15      1510489151  1
10001   14      1510489149  1
10002   13      1510489147  0
10001   12      1510489145  0
10002   11      1510489143  1
10002   7       1510489135  0
10001   5       1510489131  1
10002   2       1510489125  0
10001   1       1510489123  0

Explain $group pipeline

The keys we want to group by

Now we can group our results by user and has_pending. Because we only need one result for each user and each has_pending. So we only need one

user: 1001 with has_pending: 0, 
user: 1001 with has_pending: 1, 
user: 1002 with has_pending: 0, 
user: 1002 with has_pending: 1

this is happening inside your group aggregation:

_id: {
        user: "$user", 
        has_pending: "$has_pending"
        }

The field _id in your group is mandatory and there you describe by what you want to group by.

Use of $first inside group pipeline

Note I added:

time: { $first: "$time"},
orderid: { $first: "$orderid"}

I use $first because I know that my documents are sorted. So I'm absolutely sure that the first for

user: 1001 with has_pending: 0 will take "time" : 1510489145 and "orderid" : 12
user: 1001 with has_pending: 1 will take "time" : 1510489149 and "orderid" : 14
user: 1002 with has_pending: 0 will take "time" : 1510489147 and "orderid" : 13
user: 1002 with has_pending: 1 will take "time" : 1510489151 and "orderid" : 15

Explain $project pipeline

The $project in this case is only there to "normalize" your results. So we can have the end result that you asked for.

Upvotes: 0

Eyk Rehbein
Eyk Rehbein

Reputation: 3868

You can use the sort attribute. An example in the MongoDB shell with the same output as your SQL query:

db.collection.find({}).sort({ user: 1, orderid: 1, time: 1, has_pending: 1 }).pretty()

Upvotes: -1

Related Questions