ac2001
ac2001

Reputation: 736

Django ORM calculations between records

Is it possible to perform calculations between records in a Django query?

I know how to perform calculations across records (e.g. data_a + data_b). Is there way to perform say the percent change between data_a row 0 and row 4 (i.e. 09-30-17 and 09-30-16)?

+-----------+--------+--------+
|   date    | data_a | data_b |
+-----------+--------+--------+
| 09-30-17  |    100 |    200 |
| 06-30-17  |     95 |    220 |
| 03-31-17  |     85 |    205 |
| 12-31-16  |     80 |    215 |
| 09-30-16  |     75 |    195 |
+-----------+--------+--------+

I am currently using Pandas to perform these type of calculations, but would like eliminate this additional step if possible.

Upvotes: 2

Views: 772

Answers (2)

shacker
shacker

Reputation: 15371

There is no row 0 in a Django database, so we'll assume rows 1 and 5.

The general formula for calculation of percentage as expressed in Python is:

((b - a) / a) * 100

where a is the starting number and b is the ending number. So in your example:

a = 100
b = 75
((b - a) / a) * 100
-25.0

If your model is called Foo, the queries you want are:

(a, b) = Foo.objects.filter(id__in=[id_1, id_2]).values_list('data_a', flat=True)

values_list says "get just these fields" and flat=True means you want a simple list of values, not key/value pairs. By assigning it to the (a, b) tuple and using the __in= clause, you get to do this as a single query rather than as two.

I would wrap it all up into a standalone function or model method:

def pct_change(id_1, id_2):
    # Get a single column from two rows and return percentage of change
    (a, b) = Foo.objects.filter(id__in=[id_1, id_2]).values_list('data_a', flat=True)
    return ((b - a) / a) * 100

And then if you know the row IDs in the db for the two rows you want to compare, it's just:

print(pct_change(233, 8343))

If you'd like to calculate progressively the change between row 1 and row 2, then between row 2 and row 3, and so on, you'd just run this function sequentially for each row in a queryset. Because row IDs might have gaps we can't just use n+1 to compute the next row. Instead, start by getting a list of all the row IDs in a queryset:

rows = [r.id for r in Foo.objects.all().order_by('date')]

Which evaluates to something like

rows = [1,2,3,5,6,9,13]

Now for each elem in list and the next elem in list, run our function:

for (index, row) in enumerate(rows):
    if index < len(rows):
        current, next_ = row, rows[index + 1]
        print(current, next_)
        print(pct_change(current, next_))

Upvotes: 1

Tomb
Tomb

Reputation: 71

I would go with a Database cursor raw SQL (see https://docs.djangoproject.com/en/2.0/topics/db/sql/)

combined with a Lag() window function as so:

result = cursor.execute("""
    select date, 
       data_a - lag(data_a) over (order by date) as data_change,
from foo;""")

This is the general idea, you might need to change it according to your needs.

Upvotes: 1

Related Questions