Yoel Acevedo
Yoel Acevedo

Reputation: 93

Django - Annotate Count() of distinct values grouped by Date

I have the following Model:

class Visualization(models.Model):
    ....
    user: FK user
    start_time: DATETIME
    product: FK product
    ....

Example data:

User ID Start Time Product ID
1 2021-09-07 14:03:07 3
2 2021-09-07 13:06:00 1
1 2021-09-07 17:03:06 1
4 2021-09-07 04:03:05 5
1 2021-09-07 15:03:17 4
1 2021-09-07 19:03:27 1
2 2021-09-06 21:03:31 3
1 2021-09-06 11:03:56 9
1 2021-09-06 07:03:19 9

I need to get the active users for days, the active users are those who made at least one reproduction, if a user made many reproductions, it still counts as 1.

A correct answer would be:

Total Date
3 2021-09-07
2 2021-09-06

First I make an annotation a Truncate of StartTime to keep only the Date and then I make Group By for this annotation, so far everything without problems. The problem is when I try to count the Users since they have repetitions. I have tried to count the User_id with Distinct = True, but the numbers still give me bad, also by a very big difference. I also tried grouping by user_id and period (annotation of Truncate StartTime) but it didn't work for me either.

Real Data Example from 1 Day

| User ID | Start Time | Product ID |
| :----: | :----------: | :-------: |
|5852|2021-09-07 11:33:48.000000 +00:00|0|
|5852|2021-09-07 11:33:38.000000 +00:00|2|
|6697|2021-09-07 11:31:55.000000 +00:00|3|
|6697|2021-09-07 11:31:31.000000 +00:00|1|
|6643|2021-09-07 11:28:29.000000 +00:00|1598|
|2703|2021-09-07 11:19:05.000000 +00:00|1620|
|6697|2021-09-07 11:18:40.000000 +00:00|3|
|6697|2021-09-07 11:17:32.000000 +00:00|1|
|28295|2021-09-07 11:11:34.000000 +00:00|1618|
|6697|2021-09-07 11:11:33.000000 +00:00|3|
|23968|2021-09-07 10:54:25.000000 +00:00|0|
|6697|2021-09-07 10:53:05.000000 +00:00|1|
|6697|2021-09-07 10:52:53.000000 +00:00|3|
|6697|2021-09-07 10:50:44.000000 +00:00|1|
|11|2021-09-07 10:48:06.000000 +00:00|1478|
|23968|2021-09-07 10:47:53.000000 +00:00|0|
|23968|2021-09-07 10:45:22.000000 +00:00|0|
|28283|2021-09-07 10:20:18.000000 +00:00|1191|
|23968|2021-09-07 10:19:58.000000 +00:00|2|
|23968|2021-09-07 10:19:37.000000 +00:00|0|
|23968|2021-09-07 10:19:20.000000 +00:00|2|
|11|2021-09-07 09:09:22.000000 +00:00|1436|
|359|2021-09-07 09:08:59.000000 +00:00|88|
|359|2021-09-07 09:07:32.000000 +00:00|100|
|28275|2021-09-07 08:59:39.000000 +00:00|2|
|28275|2021-09-07 08:50:31.000000 +00:00|2|
|23968|2021-09-07 08:46:10.000000 +00:00|1572|
|23968|2021-09-07 08:45:42.000000 +00:00|2|
|359|2021-09-07 08:41:48.000000 +00:00|1550|
|23968|2021-09-07 08:26:42.000000 +00:00|0|
|23968|2021-09-07 08:19:21.000000 +00:00|2|
|23968|2021-09-07 08:18:14.000000 +00:00|0|
|23968|2021-09-07 08:16:33.000000 +00:00|0|
|2703|2021-09-07 07:01:28.000000 +00:00|1620|
|2703|2021-09-07 06:59:43.000000 +00:00|1620|
|6697|2021-09-07 02:51:50.000000 +00:00|0|
|6697|2021-09-07 02:46:18.000000 +00:00|2|
|10452|2021-09-07 00:15:03.000000 +00:00|421|
|27953|2021-09-07 00:12:35.000000 +00:00|1|

returns 20 instead of 12.

Upvotes: 3

Views: 2345

Answers (2)

Md Shahbaz Ahmad
Md Shahbaz Ahmad

Reputation: 1156

You can use extra() QuerySet modifier for group by date query:

from django.db.models import Count

Visualization.objects.extra(
    select={'start_date': 'date( start_time )'}
).values(
    'start_date'
).annotate(
    total=Count('user', distinct=True)
)

Upvotes: 0

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476493

You can make a query like:

from django.db.models import Count
from django.db.models.functions import TruncDate

Visualization.objects.values(
    date=TruncDate('start_time')
).annotate(
    total=Count('user', distinct=True)
).order_by('date')

For days where no reproduction is made, there will not be row in the QuerySet, so you will need to post-process these dates.

Upvotes: 3

Related Questions