Reputation: 2750
Is there a direct function to count distinct elements in a CosmosDb query?
This is the default count:
SELECT value count(c.id) FROM c
And distinct works without count:
SELECT distinct c.id FROM c
But this returns a Bad Request - Syntax Error:
SELECT value count(distinct c.id) FROM c
How would count
and distinct
work together?
Upvotes: 16
Views: 39140
Reputation: 2454
Here is another query that solves the district issue and works for count
. Basically, you need to encapsulate the distinct and then count. We have tested it with paging for cases that you want the unique records and not just the count and it's working as well.
select value count(1) from c join (select distinct value c from p in c.products)
You can also use where
clause inside and outside of the bracket depending on what your condition is based on.
This is also mentioned slightly differently in another answer here.
Check the select clause documentation for CosmosDB.
@ssmsexe brought this to my attention and I wanted to update the answer here.
Support for distinct has been added on 19th Oct 2018
The following query works just fine
SELECT distinct value c FROM c join p in c.products
However, it still doesn't work for count.
The workaround for counting distinct is to create a stored procedure to perform the distinct count. It will basically query and continue until the end and return the count.
If you pass a distinct query like above to the stored procedure below you will get a distinct count
function count(queryCommand) {
var response = getContext().getResponse();
var collection = getContext().getCollection();
var count = 0;
query(queryCommand);
function query(queryCommand, continuation){
var requestOptions = { continuation: continuation };
var isAccepted = collection.queryDocuments(
collection.getSelfLink(),
queryCommand,
requestOptions,
function (err, feed, responseOptions) {
if (err) {
throw err;
}
// Scan results
if (feed) {
count+=feed.length;
}
if (responseOptions.continuation) {
// Continue the query
query(queryCommand, responseOptions.continuation)
} else {
// Return the count in the response
response.setBody(count);
}
});
if (!isAccepted) throw new Error('The query was not accepted by the server.');
}
}
The issue with that workaround is that it can potentially cross the RU limit on your collection and be unsuccessful. If that's the case you can implement a similar code on the server side which is not that great.
Upvotes: 11
Reputation: 551
I did some investigation and found solution for it. In order to get count of distinct results you can not use count(1)
. You need to "wrap" subquery with AS subqueryName
and then use count(subqueryName)
like below:
select count(subqueryName) from (SELECT distinct r.x FROM r) as subqueryName
Cheers!
Upvotes: 4
Reputation: 818
To count distinct elements you have to use COUNT and GROUP BY together. You don't need need subqueries, it works in a very simple query like this example where we want to list all the family unique last names in our container and the count of families having the same name:
select count(1) as numfam, f.lastName from f group by f.lastName
result:
[
{
"numfam": 1
},
{
"numfam": 1,
"lastName": "Wakefield"
},
{
"numfam": 2,
"lastName": "Andersen"
}
]
Notice I have one item in my collection without lastName, freedom of being schemaless.
unfortunately today is not yet possible to add an "order by" clause to the query to sort for example the most common names in descending order. The cosmos team declared is working on it though, so this feature is expected at some point. You can always sort the result in your client code.
If you want to know the count for a specific name you can use this query (you can use a parametrized query to input the name in one place) :
select "Andersen" as lastName, count(1) as numfam from f
where f.lastName = "Andersen"
result:
[
{
"lastName": "Andersen",
"numfam": 2
}
]
Upvotes: 1
Reputation: 81
I know this is an old thread.
However, just to keep the topic updated, currently (Jul 2020) you are able to do SELECT DISTINCT over Cosmos DB table. However directly applying COUNT(DISTINCT..) doesnt give correct results. Hence, you need to apply a workaround as below using a subquery based approach to get the correct distinct count results
SELECT COUNT(UniqueIDValues) AS UniqueCount
FROM (SELECT Id FROM c GROUP BY Id) AS UniqueIDValues
Upvotes: 8
Reputation: 141
How about SELECT COUNT(1) FROM (SELECT distinct c.id FROM c) AS t;? – Evaldas Buinauskas May 30 '18 at 14:44
On 15 May 2019, The comment above is working with Where
condition, I didn't try with a Join
but the request does return the answer I'm looking for.
And it is working with the 100 elements limitation in CosmosDB.
If I make an example with Product it should be :
SELECT COUNT(1) FROM (SELECT DISTINCT c.Id FROM c WHERE c.Brand = 'Coca')
Upvotes: 9
Reputation: 892
Azure cosmos DB doesn't support the distinct keyword yet as part of the SQL API. The best way to achieve this is by using the stored procedure with custom code. Please find more details regarding the custom stored proc here.
It seems the distinct keyword is underdevelopment.
Please find the reference link here.
cosmos DB supports most of the aggregate functions, please see the list of of supported aggregate functions here.
Please find the more details in the following link.
Upvotes: 0
Reputation: 740
As I know, for now, Cosmos DB does not support nested queries.
The only way to do what u want is to return all distinct ids as a query result and then do count on them. You can either do in directly in code or with help of stored procedure(that should be more efficient on big number of docs).
Upvotes: 2