Ovi
Ovi

Reputation: 2750

CosmosDb count distinct elements

Is there a direct function to count distinct elements in a CosmosDb query?

This is the default count:

SELECT value count(c.id) FROM c

And distinct works without count:

SELECT distinct c.id FROM c

But this returns a Bad Request - Syntax Error: SELECT value count(distinct c.id) FROM c

How would count and distinct work together?

Upvotes: 16

Views: 39140

Answers (7)

Aboo
Aboo

Reputation: 2454

[Update 19 Nov 2020]

Here is another query that solves the district issue and works for count. Basically, you need to encapsulate the distinct and then count. We have tested it with paging for cases that you want the unique records and not just the count and it's working as well.

select value count(1) from c join (select distinct value c from p in c.products)

You can also use where clause inside and outside of the bracket depending on what your condition is based on.

This is also mentioned slightly differently in another answer here.

Check the select clause documentation for CosmosDB.

@ssmsexe brought this to my attention and I wanted to update the answer here.

[Original Answer]

Support for distinct has been added on 19th Oct 2018

The following query works just fine

SELECT distinct value c FROM c join p in c.products

However, it still doesn't work for count.

The workaround for counting distinct is to create a stored procedure to perform the distinct count. It will basically query and continue until the end and return the count.

If you pass a distinct query like above to the stored procedure below you will get a distinct count

function count(queryCommand) {
  var response = getContext().getResponse();
  var collection = getContext().getCollection();
  var count = 0;

  query(queryCommand);

  function query(queryCommand, continuation){
    var requestOptions = { continuation: continuation };
    var isAccepted = collection.queryDocuments(
        collection.getSelfLink(),
        queryCommand,
        requestOptions,
        function (err, feed, responseOptions) {
            if (err) {
                throw err;
            }

            //  Scan results
            if (feed) {
                count+=feed.length;
            }

            if (responseOptions.continuation) {
                //  Continue the query
                query(queryCommand, responseOptions.continuation)
            } else {
                //  Return the count in the response
                response.setBody(count);
            }
        });
    if (!isAccepted) throw new Error('The query was not accepted by the server.');
  }
}

The issue with that workaround is that it can potentially cross the RU limit on your collection and be unsuccessful. If that's the case you can implement a similar code on the server side which is not that great.

Upvotes: 11

gorrch
gorrch

Reputation: 551

I did some investigation and found solution for it. In order to get count of distinct results you can not use count(1). You need to "wrap" subquery with AS subqueryName and then use count(subqueryName) like below:

select count(subqueryName) from (SELECT distinct r.x FROM r) as subqueryName

Cheers!

Upvotes: 4

Anton M
Anton M

Reputation: 818

To count distinct elements you have to use COUNT and GROUP BY together. You don't need need subqueries, it works in a very simple query like this example where we want to list all the family unique last names in our container and the count of families having the same name:

select count(1) as numfam, f.lastName from f group by f.lastName

result:

[
    {
        "numfam": 1
    },
    {
        "numfam": 1,
        "lastName": "Wakefield"
    },
    {
        "numfam": 2,
        "lastName": "Andersen"
    }
]

Notice I have one item in my collection without lastName, freedom of being schemaless.

unfortunately today is not yet possible to add an "order by" clause to the query to sort for example the most common names in descending order. The cosmos team declared is working on it though, so this feature is expected at some point. You can always sort the result in your client code.

If you want to know the count for a specific name you can use this query (you can use a parametrized query to input the name in one place) :

select "Andersen" as lastName, count(1) as numfam from f  
where  f.lastName = "Andersen"

result:

[
    {
        "lastName": "Andersen",
        "numfam": 2
    }
]

Upvotes: 1

Visakh
Visakh

Reputation: 81

I know this is an old thread.

However, just to keep the topic updated, currently (Jul 2020) you are able to do SELECT DISTINCT over Cosmos DB table. However directly applying COUNT(DISTINCT..) doesnt give correct results. Hence, you need to apply a workaround as below using a subquery based approach to get the correct distinct count results

SELECT COUNT(UniqueIDValues) AS UniqueCount
FROM (SELECT Id FROM c GROUP BY Id) AS UniqueIDValues

Upvotes: 8

Azutanguy
Azutanguy

Reputation: 141

How about SELECT COUNT(1) FROM (SELECT distinct c.id FROM c) AS t;? – Evaldas Buinauskas May 30 '18 at 14:44

On 15 May 2019, The comment above is working with Where condition, I didn't try with a Join but the request does return the answer I'm looking for.

And it is working with the 100 elements limitation in CosmosDB.

If I make an example with Product it should be : SELECT COUNT(1) FROM (SELECT DISTINCT c.Id FROM c WHERE c.Brand = 'Coca')

Upvotes: 9

Ravikumar B
Ravikumar B

Reputation: 892

Azure cosmos DB doesn't support the distinct keyword yet as part of the SQL API. The best way to achieve this is by using the stored procedure with custom code. Please find more details regarding the custom stored proc here.

It seems the distinct keyword is underdevelopment.
Please find the reference link here.

cosmos DB supports most of the aggregate functions, please see the list of of supported aggregate functions here.

Please find the more details in the following link.

Upvotes: 0

Olha Shumeliuk
Olha Shumeliuk

Reputation: 740

As I know, for now, Cosmos DB does not support nested queries.

The only way to do what u want is to return all distinct ids as a query result and then do count on them. You can either do in directly in code or with help of stored procedure(that should be more efficient on big number of docs).

Upvotes: 2

Related Questions