James B.
James B.

Reputation: 651

How do I get the latest record for each item in CosmosDB using SQL

I have a schema which is similar to

"id": "uuid",
"deviceId": "uuid",
"message": {
    "content": "string",
    "ts": 1
},
"data": {
    "temperature": 21
}

I'd like to get the latest "data" (using message.ts as the timestamp) for each "deviceId".

So far, I've managed to get the data back, in order of timestamp using the query SELECT c.deviceId, c.message.ts, c.data FROM c ORDER BY c.message.ts DESC but I can't figure out how to remove the duplicate device records.

Is this possible to do within the CosmosDB SQL Engine?

Upvotes: 3

Views: 7447

Answers (4)

siddharth
siddharth

Reputation: 231

I agree with Steve Johnson's answer, but I think, at the time the answer was given Cosmos might not be supporting Joins, as of today it does.

You need to run 2 queries to do the same.

First, Run a query to get the max of every c.message.ts per device Id just like Steve Johnson's answer.

SELECT c.deviceId, max(c.message.ts) as lastest FROM c group by c.deviceId

Then, use the response from the above query with JOIN in Cosmos DB.

SELECT s.id
FROM
    <Container_name> AS s
JOIN m IN (
    SELECT VALUE [
        {deviceId: 'uuid1', lastest: '1'},
        {deviceId: 'uuid2', lastest: '2'}  
    ]
)
WHERE
    s.deviceId = m.deviceId
    and s.message.ts = m.lastest

Learn more here, https://learn.microsoft.com/en-us/azure/cosmos-db/nosql/query/subquery

Upvotes: 0

user9581760
user9581760

Reputation: 11

Another route you could take is to use a trigger function in CosmosDb. The benefit of this would be that you don't need to deploy an Azure function and you can just use SQL to get the most recent item. For example, when you get a new item, you can use a pre-trigger to set a field as follows: latest = true, and at the same time change the previous most recent item's latest field to false. Then your SQL query would simply need WHERE latest = true to return the most recent record for every item. Here is a trigger function with the general idea:

function setLatest() {  
    var context = getContext();  
    var request = context.getRequest();  
  
    // item to be created in the current operation  
    var itemToCreate = request.getBody();  
  
    // validate properties  
    if (!("latest" in itemToCreate)) {
        itemToCreate["latest"] = true;  
    }
    // update the old latest to false
    removeCurrentLatest(itemToCreate["id"],..., );
    // save the newer item that will be created  
    request.setBody(itemToCreate);
}

function removeCurrentLatest(id, ...) {
    var collection = getContext().getCollection();
    var collectionLink = collection.getSelfLink();
        // Get the document. We keep it in the same collection.
        var isAccepted = collection.queryDocuments
            (collectionLink, `SELECT * FROM root r WHERE r.id = "${id}" AND .... AND r.latest = true`,
            function (err, feed, options) {
            if (err) throw err;
            if (feed && feed[0] != null) 
            {
                var oldDoc = feed[0];
                oldDoc.latest = false;
                var isAccepted = collection.replaceDocument(oldDoc._self, oldDoc, function (err) {
                    if (err) throw err;
                    });
                if (!isAccepted) throw new Error("The call replaceDocument(oldDoc) returned false.");
            }
        });
        if (!isAccepted) throw new Error("The call queryDocuments for oldDoc returned false.");
    }

I've removed some of the other conditions you may want to include to ensure you select the correct previous version of the item - hopefully it should be obvious how to add those for your specific asset. This is an excellent post on using triggers in CosmosDB: http://www.johndowns.co.nz/blog/2018/1/30/cosmos-db-server-side-programming-with-typescript-part-4-triggers

Upvotes: 1

James B.
James B.

Reputation: 651

Thanks to Mark Brown's comment, I found the following which seems to be the correct solution to this problem. Not as elegant as just using some SQL for a one-off but is really what was needed.

https://learn.microsoft.com/en-us/samples/azure-samples/cosmosdb-materialized-views/real-time-view-cosomos-azure-functions/

In essence, you create a Serverless Function which is triggered by the Cosmos change feed and updates a materialized view, which is essentially just a document with (in this case) the most up to date data per deviceId.

Specifically for this case, it'll most likely update the corresponding device document with it's most recent data.

Upvotes: 0

Steve Johnson
Steve Johnson

Reputation: 8660

It is impossible to achieve this with one SQL by now.

May be this can be an alternative.

First, run this SQL SELECT c.deviceId,max(c.message.ts) as lastest FROM c group by c.deviceId.

Then, you can get data by this SQL, SELECT * FROM c WHERE c.deviceId = 'xxx' AND c.message.ts = xxxx

Upvotes: 0

Related Questions