Amiribus
Amiribus

Reputation: 11

Count on big Django Datas

I have a LogTable with about 500k entries per week.

models.py

class Log(models.Model):
   timestamp= models.DateTimeField()
   name = models.CharField(max_length=30)
   status = models.PositiveIntegerField()
   objects = LogManager()

I want to group the entries by the timestamp and get the counted numbers of each status per timestamp back. Like this:

timestamp      | automated | offline | user | inactive
29.10.20 17:40 |    5      |   40    |  30  |    15
29.10.20 17:45 |    10     |  ....

I tried this with a Manager like this:

class LogManager(models.Manager):
def auto(self, timestamp):
    return self.filter(datetime__exact=timestamp).filter(status__exact=0).count()
def inactive(self, timestamp):
    return self.filter(datetime__exact=timestamp).filter(status__exact=1).count()
def offline(self, timestamp):
    return self.filter(datetime__exact=timestamp).filter(status__exact=2).count()
def user(self, timestamp):
    return self.filter(datetime__exact=timestamp).filter(status__exact=3).count()
def activity(self, timestamp):
    data = {
        'timestamp': timestamp,
        'auto' : self.auto(timestamp),
        'inactive' : self.inactive(timestamp),
        'offline': self.offline(timestamp),
        'user': self.user(timestamp),
    }
    return data
    def activity_sum(self):
       obj = self.values_list('datetime', flat=True)
       data = {}
       for i, time in enumerate(obj):
          data[i] = self.activity(time)
       return data

But this can't be the way because it lasts about 10 minutes to calculate the results if I call Log.objects.activity_sum(). I feel like there is a simple answer but I just can't find it.

Thanks for helping me out.

Upvotes: 0

Views: 112

Answers (1)

MaximeK
MaximeK

Reputation: 2061

My first read, told me to never do queries in a loop, that will be your first improvement.

Did datetime field is an index ?

Did you try to ask for 1 database result order by datetime and split it in Django, sometime database accees is more costly than doing your split in Python.

https://docs.djangoproject.com/fr/3.0/ref/models/indexes/ https://stackoverflow.com/a/37382695/5073377

Upvotes: 1

Related Questions