Karl
Karl

Reputation: 743

How to count values across fields in Django?

MODELS

class ModelA(models.Model):
    name = models.CharField()

class ModelB(models.Model):
    MY_CHOICES = (
        ('X', 'X'),
        ('Y', 'Y'),
        ('Z', 'Z'),
    )
    modela = models.ForeignKey(ModelA, on_delete=models.CASCADE)
    txt_1 = models.CharField(choices=MY_CHOICES)
    txt_2 = models.CharField(choices=MY_CHOICES)

Given the simplified example above, how could I count how many times each of the choices values has been recorded given that there are Two fields that need to be counted?

Ideally the result would be something along the lines of:

{'X': 15, 'Y': 27, 'Z': 89}

I have tried the following, however in my real model, I have around 20 fields to count across and this is not giving the results I was hoping for:

ModelA.objects.values('modelb__txt1', 'modelb__txt2').annotate(Count('modelb__txt1', 'modelb__txt2'))

I have previously created huge dictionaries and manually sorted/counted the values, but this is now unmanageable and ugly.

Upvotes: 2

Views: 1222

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477607

With one query (for a limited number of columns)

With one query, we can do this like:

from django.db.models import Count

qs = ModelB.objects.values('txt_1', 'txt_2').annotate(
    cnt=Count('id')
).order_by('txt_1', 'txt_2')

But now we are still not there, since now we have for every combination of txt_1 and txt_2 the number of elements. We want to "flatten" this to every individual choice. We can do this for example by constructing a Counter [Python-doc]:

from collections import Counter

result = Counter()
for row in qs:
    result[row['txt_1']] += row['cnt']
    result[row['txt_2']] += row['cnt']

For every row of this QuerySet, we thus add the number (cnt) to the two keys. This thus means that we count a row where both txt_1, and txt_2 have value 'X' twice.

A Counter is a subclass of a dictionary, but if you want to cast it to a dictionary, you can later write:

result_dict = dict(result)

Options that are never selected, will not appear in the dictionary, since the queryset will not contain these, and thus we will never add these to the Counter. But we can of course post-process the dictionary, and add 0 for these.

With n queries (with n the number of columns)

The above will work usually quite well. If the number of choices is however quite large, the processing will be more on the Python side, which is typically slower. We can then make a linearization, and work with two queries:

from collections import Counter
from django.db.models import Count

result = Counter()
for col in ['txt_1', 'txt_2']:
    qs = ModelB.objects.values(col).annotate(cnt=Count('id')).order_by(col)
    result.update({q[col]: q['cnt'] for q in qs})

This will here reduce in two queries. But in this case each query will (at most) return three rows. Whereas the other approach will result in one query returning (at most) nine rows. For a small amount of rows, that is not a problem. But the number of cases can easily grow exponential in the number of columns.

Upvotes: 1

Related Questions