Reputation: 736
Is it possible to perform calculations between records in a Django query?
I know how to perform calculations across records (e.g. data_a + data_b). Is there way to perform say the percent change between data_a row 0 and row 4 (i.e. 09-30-17 and 09-30-16)?
+-----------+--------+--------+
| date | data_a | data_b |
+-----------+--------+--------+
| 09-30-17 | 100 | 200 |
| 06-30-17 | 95 | 220 |
| 03-31-17 | 85 | 205 |
| 12-31-16 | 80 | 215 |
| 09-30-16 | 75 | 195 |
+-----------+--------+--------+
I am currently using Pandas to perform these type of calculations, but would like eliminate this additional step if possible.
Upvotes: 2
Views: 772
Reputation: 15371
There is no row 0 in a Django database, so we'll assume rows 1 and 5.
The general formula for calculation of percentage as expressed in Python is:
((b - a) / a) * 100
where a is the starting number and b is the ending number. So in your example:
a = 100
b = 75
((b - a) / a) * 100
-25.0
If your model is called Foo, the queries you want are:
(a, b) = Foo.objects.filter(id__in=[id_1, id_2]).values_list('data_a', flat=True)
values_list
says "get just these fields" and flat=True
means you want a simple list of values, not key/value pairs. By assigning it to the (a, b)
tuple and using the __in=
clause, you get to do this as a single query rather than as two.
I would wrap it all up into a standalone function or model method:
def pct_change(id_1, id_2):
# Get a single column from two rows and return percentage of change
(a, b) = Foo.objects.filter(id__in=[id_1, id_2]).values_list('data_a', flat=True)
return ((b - a) / a) * 100
And then if you know the row IDs in the db for the two rows you want to compare, it's just:
print(pct_change(233, 8343))
If you'd like to calculate progressively the change between row 1 and row 2, then between row 2 and row 3, and so on, you'd just run this function sequentially for each row in a queryset. Because row IDs might have gaps we can't just use n+1 to compute the next row. Instead, start by getting a list of all the row IDs in a queryset:
rows = [r.id for r in Foo.objects.all().order_by('date')]
Which evaluates to something like
rows = [1,2,3,5,6,9,13]
Now for each elem in list and the next elem in list, run our function:
for (index, row) in enumerate(rows):
if index < len(rows):
current, next_ = row, rows[index + 1]
print(current, next_)
print(pct_change(current, next_))
Upvotes: 1
Reputation: 71
I would go with a Database cursor raw SQL (see https://docs.djangoproject.com/en/2.0/topics/db/sql/)
combined with a Lag() window function as so:
result = cursor.execute("""
select date,
data_a - lag(data_a) over (order by date) as data_change,
from foo;""")
This is the general idea, you might need to change it according to your needs.
Upvotes: 1