Reputation: 743
MODELS
class ModelA(models.Model):
name = models.CharField()
class ModelB(models.Model):
MY_CHOICES = (
('X', 'X'),
('Y', 'Y'),
('Z', 'Z'),
)
modela = models.ForeignKey(ModelA, on_delete=models.CASCADE)
txt_1 = models.CharField(choices=MY_CHOICES)
txt_2 = models.CharField(choices=MY_CHOICES)
Given the simplified example above, how could I count how many times each of the choices values has been recorded given that there are Two fields that need to be counted?
Ideally the result would be something along the lines of:
{'X': 15, 'Y': 27, 'Z': 89}
I have tried the following, however in my real model, I have around 20 fields to count across and this is not giving the results I was hoping for:
ModelA.objects.values('modelb__txt1', 'modelb__txt2').annotate(Count('modelb__txt1', 'modelb__txt2'))
I have previously created huge dictionaries and manually sorted/counted the values, but this is now unmanageable and ugly.
Upvotes: 2
Views: 1222
Reputation: 477607
With one query, we can do this like:
from django.db.models import Count
qs = ModelB.objects.values('txt_1', 'txt_2').annotate(
cnt=Count('id')
).order_by('txt_1', 'txt_2')
But now we are still not there, since now we have for every combination of txt_1
and txt_2
the number of elements. We want to "flatten" this to every individual choice. We can do this for example by constructing a Counter
[Python-doc]:
from collections import Counter
result = Counter()
for row in qs:
result[row['txt_1']] += row['cnt']
result[row['txt_2']] += row['cnt']
For every row of this QuerySet
, we thus add the number (cnt
) to the two keys. This thus means that we count a row where both txt_1
, and txt_2
have value 'X'
twice.
A Counter
is a subclass of a dictionary, but if you want to cast it to a dict
ionary, you can later write:
result_dict = dict(result)
Options that are never selected, will not appear in the dictionary, since the queryset will not contain these, and thus we will never add these to the Counter
. But we can of course post-process the dictionary, and add 0 for these.
The above will work usually quite well. If the number of choices is however quite large, the processing will be more on the Python side, which is typically slower. We can then make a linearization, and work with two queries:
from collections import Counter
from django.db.models import Count
result = Counter()
for col in ['txt_1', 'txt_2']:
qs = ModelB.objects.values(col).annotate(cnt=Count('id')).order_by(col)
result.update({q[col]: q['cnt'] for q in qs})
This will here reduce in two queries. But in this case each query will (at most) return three rows. Whereas the other approach will result in one query returning (at most) nine rows. For a small amount of rows, that is not a problem. But the number of cases can easily grow exponential in the number of columns.
Upvotes: 1