1010101
1010101

Reputation: 991

django query aggregate function is slow?

I am working with Django to see how to handle large databases. I use a database with fields name, age, date of birth(dob) and height. The database has about 500000 entries. I have to find the average height of persons of (1) same age and (2) born in same year. The aggregate function in querying table takes about 10s. Is it usual or am I missing something?

For age:

age = [i[0] for i in Data.objects.values_list('age').distinct()]
ht = []
for each in age:
    aggr = Data.objects.filter(age=each).aggregate(ag_ht=Avg('height')
    ht.append(aggr)

From dob,

age = [i[0].year for i in Data.objects.values_list('dob').distinct()]
for each in age:
    aggr = Data.objects.filter(dob__contains=each).aggregate(ag_ht=Avg(‌​'height')
    ht.append(aggr)

The year has to be extracted from dob. It is SQLite and I cannot use __year (join).

Upvotes: 0

Views: 1648

Answers (2)

user8060120
user8060120

Reputation:

full version with time compare loop and query set version

import time
from dd.models import Data
from django.db.models import Avg
from django.db.models.functions import ExtractYear

for age

start = time.time()
age = [i[0] for i in Data.objects.values_list('age').distinct()]
ht = []

for each in age:
    aggr = Data.objects.filter(age=each).aggregate(ag_ht=Avg('height'))
    ht.append(aggr)

end = time.time()
loop_time = end - start

start = time.time()
qs = Data.objects.values('age').annotate(ag_ht=Avg('height')).order_by('age')
ht_qs = qs.values_list('age', 'ag_ht')
end = time.time()
qs_time = end - start

print loop_time / qs_time

for dob year, with easy refactoring your version(add set in the years)

start = time.time()
years = set([i[0].year for i in Data.objects.values_list('dob').distinct()])
ht_year_loop = []
for each in years:
    aggr = Data.objects.filter(dob__contains=each).aggregate(ag_ht=Avg('height'))
    ht_year_loop.append((each, aggr.get('ag_ht')))

end = time.time()
loop_time = end - start

start = time.time()
qs = Data.objects.annotate(dob_year=ExtractYear('dob')).values('dob_year').annotate(ag_ht=Avg('height'))
ht_qs = qs.values_list('dob_year', 'ag_ht')
end = time.time()
qs_time = end - start

print loop_time / qs_time

Upvotes: 0

CL.
CL.

Reputation: 180290

For these queries to be efficient, you have to create indexes on the age and dob columns.

You will get a small additional speedup by using covering indexes, i.e., using two-column indexes that also include the height column.

Upvotes: 1

Related Questions