Reputation: 3311
I am trying to draw a day series graph, whereby it will show the number of employees of the division(s) in a day series. The problem here is that I want to find a way to limit to the division with the most employees so I dont need to draw too many lines and stress the database. (for example limit it to top 3 division with the most employee attendance)
I am using: Django 1.11.x Postgres 9.4
The goal is to create a day series graphing of this sort. has the grouping of division and the number of employee.
I have manage to achieve it with the following code:
from datetime import date, datetime
from django.db.models import Count
from django.db.models.functions import (
TruncDate, TruncDay, TruncHour, TruncMinute, TruncSecond,
)
emp_by_day = Attendance.objects.annotate(day=TruncDay('created_at')).values('day', 'division_id').annotate(cnt=Count('employee_id', distinct = True)).order_by('day')
for exp in emp_by_day:
print(exp['day'], exp['division_id'], exp['cnt'])
however, it currently output displays like this (i am generally happy but want to limit it):
employee count<->
division_id<->
<---day----------------->
2019-10-22 00:00:00+00:00 15 6
2019-10-22 00:00:00+00:00 16 6
2019-10-22 00:00:00+00:00 18 5
2019-10-22 00:00:00+00:00 20 4
2019-10-22 00:00:00+00:00 21 12 <-- largest 3
2019-10-22 00:00:00+00:00 25 14 <-- largest 3
2019-10-22 00:00:00+00:00 28 12 <-- largest 3
2019-10-23 00:00:00+00:00 15 6
2019-10-23 00:00:00+00:00 16 5
2019-10-23 00:00:00+00:00 18 2
2019-10-23 00:00:00+00:00 20 3
2019-10-23 00:00:00+00:00 21 14 <-- largest 3
2019-10-23 00:00:00+00:00 25 17 <-- largest 3
2019-10-23 00:00:00+00:00 28 13 <-- largest 3
2019-10-24 00:00:00+00:00 15 2
2019-10-24 00:00:00+00:00 16 6
2019-10-24 00:00:00+00:00 18 5
2019-10-24 00:00:00+00:00 20 4
2019-10-24 00:00:00+00:00 21 13 <-- largest 3
2019-10-24 00:00:00+00:00 25 12 <-- largest 3
2019-10-24 00:00:00+00:00 28 10 <-- largest 3
my goal is to produce this (limit it to the largest 3 division):
2019-10-22 00:00:00+00:00 21 12 <-- largest 3
2019-10-22 00:00:00+00:00 25 14 <-- largest 3
2019-10-22 00:00:00+00:00 28 12 <-- largest 3
2019-10-23 00:00:00+00:00 21 14 <-- largest 3
2019-10-23 00:00:00+00:00 25 17 <-- largest 3
2019-10-23 00:00:00+00:00 28 13 <-- largest 3
2019-10-24 00:00:00+00:00 21 13 <-- largest 3
2019-10-24 00:00:00+00:00 25 12 <-- largest 3
2019-10-24 00:00:00+00:00 28 10 <-- largest 3
Do let me know how can I produce such an intended output (limit it to the largest 3 division)
Upvotes: 5
Views: 1507
Reputation: 5016
You should use Rank()
window function to filter out results.
Logic:
Assuming that you want grouping per day
You should give a rank to each row based on the count cnt
value partitioned over day. The highest will get 1st rank and so on. Now you should filter out the result which has rank between 1 to 3.
Continuing your query
emp_by_day.annotate(rank=Window(
expression=Rank(),
order_by=F('cnt').desc(),
partition_by=[F('day')])).filter(rank__range=(1,3))
Note: In case of same value of cnt
for more than one row, the rank will be same for two or more rows. Hence you can get more than 3 rows. If you want only first 3 rows then use RowNumber()
instead of Rank()
.
Postgres Example Query:
select * from (
select *, rank() over (partition by day order by cnt desc) as rank from
(
select emp_id,day,count(emp_id) as cnt from attendance group by emp_id,day
order by day
) as T
) as Temp where rank between 1 and 3;
Replace rank()
by row_number()
to get only first 3 rows.
Update
Django 1.11 does not supports window()
. However you can refer following gist which back-ports this functionality from Django 2 to 1.11.
Note: I have not tested it. However, the OP creator tested it and it works.
Upvotes: 1
Reputation: 5249
First find out what the divisions you want to graph are (let's call that set best_divisions
) and then filter them in your query.
Attendance.objects.filter(division__in=best_divisions).annotate(day=…
In order to find the divisions you can for example:
best_divitions = Division.objects.annotate(
total_attendance=Count("attendance__employee", distinct=True),
).order_by("-total_attendance")[:3]
Upvotes: 1