Akshay Shah
Akshay Shah

Reputation: 490

Converting complex GROUP BY and INNER JOIN SQL statement to Django ORM Level

I am trying to convert my below mention SQL query to Django ORM layer query but I was not able to get the perfect output as provided by the SQL statement. Models

class YearlyTable(models.Model):

   class Meta:
       db_table = 'yearlytable'
       managed = True

   user_id = models.IntegerField(db_index=True)
   rotations = models.IntegerField()
   calories = models.FloatField()
   distance = models.FloatField()
   duration = models.IntegerField(default=0)
   year = models.IntegerField()
   created = models.DateTimeField(auto_now_add=True)
   modified = models.DateTimeField(auto_now=True)

class User(AbstractBaseUser):

    class Meta:
        db_table = 'users'
        managed = True

    email = models.EmailField(max_length=255, unique=True)
    first_name = models.CharField(max_length=255, blank=True, null=True)
    city = models.CharField(max_length=200, blank=True, null=True)
    state = models.CharField(max_length=200, blank=True, null=True)
    postal_code = models.IntegerField(blank=True, null=True)
    country = models.CharField(max_length=200, blank=True, null=True)

SELECT 
users.state,
sum(yearlytable.rotations) as sum_rotations,
sum(yearlytable.calories) as sum_calories,
sum(yearlytable.distance) as sum_distance
    FROM yearlytable
    INNER JOIN users on (yearlytable.user_id = users.id)
    WHERE yearlytable.user_id in(SELECT id FROM users WHERE country LIKE 'United States%' and  NOT ("email" LIKE '%yopmail.com%'))
    GROUP BY users.state

Then I tried to execute the above-mentioned query using RAW Django Query Example:

User.objects.raw('select users.state,sum(yearlytable.rotations) as sum_rotations,sum(yearlytable.calories) as sum_calories,sum(yearlytable.distance) as sum_distance from yearlytable inner join users on (yearlytable.user_id = users.id) where yearlytable.user_id in(select id from users where country like \'United States%\' and  NOT ("email" LIKE \'%yopmail.com%\')) group by users.state;')

But this also didn't work. Now I don't want to use CURSOR for this as I am afraid of SQL Injection issue. So Cursor is off the table.

for u in User.objects.raw('select users.state,sum(yearlytable.rotations) as sum_rotations,sum(yearlytable.calories) as sum_calories,sum(yearlytable.distance) as sum_distance from yearlytable inner join users on (yearlytable.user_id = users.id) where yearlytable.user_id in(select id from users where country like \'United States%\' and  NOT ("email" LIKE \'%yopmail.com%\')) group by users.state;'):
            print u

Below is the stack trace:

    Traceback:
File "/home/akki/rest_api/venv/local/lib/python2.7/site-packages/django/core/handlers/base.py" in get_response
  111.                     response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/home/akki/rest_api/venv/local/lib/python2.7/site-packages/django/views/decorators/csrf.py" in wrapped_view
  57.         return view_func(*args, **kwargs)
File "/home/akki/rest_api/venv/local/lib/python2.7/site-packages/django/views/generic/base.py" in view
  69.             return self.dispatch(request, *args, **kwargs)
File "/home/akki/rest_api/venv/local/lib/python2.7/site-packages/rest_framework/views.py" in dispatch
  407.             response = self.handle_exception(exc)
File "/home/akki/rest_api/venv/local/lib/python2.7/site-packages/rest_framework/views.py" in dispatch
  404.             response = handler(request, *args, **kwargs)
File "/home/akki/rest_api/venv/local/lib/python2.7/site-packages/rest_framework/decorators.py" in handler
  51.             return func(*args, **kwargs)
File "/home/akki/rest_api/widget/views.py" in heat_map
  18.         for u in User.objects.raw('select users.state,sum(yearlytable.rotations) as sum_rotations,sum(yearlytable.calories) as sum_calories,sum(yearlytable.distance) as sum_distance from yearlytable inner join users on (yearlytable.user_id = users.id) where yearlytable.user_id in(select id from users where country like \'United States%\' and  NOT ("email" LIKE \'%yopmail.com%\')) group by users.state;'):
File "/home/akki/rest_api/venv/local/lib/python2.7/site-packages/django/db/models/query.py" in __iter__
  1535.         query = iter(self.query)
File "/home/akki/rest_api/venv/local/lib/python2.7/site-packages/django/db/models/sql/query.py" in __iter__
  76.         self._execute_query()
File "/home/akki/rest_api/venv/local/lib/python2.7/site-packages/django/db/models/sql/query.py" in _execute_query
  90.         self.cursor.execute(self.sql, self.params)
File "/home/akki/rest_api/venv/local/lib/python2.7/site-packages/django/db/backends/utils.py" in execute
  81.             return super(CursorDebugWrapper, self).execute(sql, params)
File "/home/akki/rest_api/venv/local/lib/python2.7/site-packages/django/db/backends/utils.py" in execute
  65.                 return self.cursor.execute(sql, params)

The Django ORM which I tried was:

YearlyTable.objects.annotate(r=Sum('rotations'))

It would be great to convert this sql query to django orm level.

Upvotes: 0

Views: 651

Answers (2)

Bobby Chowdhury
Bobby Chowdhury

Reputation: 334

Assumptions:

  1. Use django ORM without resorting to raw SQL
  2. Design the django models idiomatically, meaning related tables should use models ForeignKey, OneonOne or ManytoMany attributes.
  3. YearlyTable assumed to have a one to one relationship with user.

In models.py:

from django.db import models
from django.contrib.auth.models import AbstractBaseUser

class User(AbstractBaseUser):
    email = models.EmailField(max_length=255, unique=True)
    first_name = models.CharField(max_length=255, blank=True, null=True)
    city = models.CharField(max_length=200, blank=True, null=True)
    state = models.CharField(max_length=200, blank=True, null=True)
    postal_code = models.IntegerField(blank=True, null=True)
    country = models.CharField(max_length=200, blank=True, null=True)


    def __unicode__(self):
        return self.email


class YearlyTable(models.Model):
    user =  models.OneToOneField('User', unique=True)
    rotations = models.IntegerField()
    calories = models.FloatField()
    distance = models.FloatField()
    duration = models.IntegerField(default=0)
    year = models.IntegerField()
    created = models.DateTimeField(auto_now_add=True)
    modified = models.DateTimeField(auto_now=True)

    def __unicode__(self):
        return str(self.user)

I populated the tables with the following sample data:


    u = User(email='[email protected]', first_name='ab', city='New York', state='New York', postal_code='12345', country='United States')
    y = YearlyTable(user=u, rotations=10, calories=10.8, distance=12.5, duration=20, year=2011)

    u = User(email='[email protected]', first_name='ac', city='Buffalo', state='New York', postal_code='67891', country='United States') 
    y = YearlyTable(user=u, rotations=8, calories=11.8, distance=11.5, duration=30, year=2012)

    u = User(email='[email protected]', first_name='ad', city='Rochester', state='New York', postal_code='13579', country='United States')
    y = YearlyTable(user=u, rotations=20, calories=15.8, distance=13.5, duration=40, year=2013)

    u = User(email='[email protected]', first_name='ae', city='Pittsburgh', state='Pennsylvania', postal_code='98765', country='United States')
    y = YearlyTable(user=u, rotations=30, calories=10.2, distance=12.5, duration=40, year=2012)

    u = User(email='[email protected]', first_name='af', city='Los Angeles', state='California', postal_code='97531', country='United States')
    y = YearlyTable(user=u, rotations=10, calories=14.8, distance=13.5, duration=10, year=2010)

Checking the physical tables and querying directly against it


    psql -d 

    # select * from testapp_user;
     id | password | last_login |  email  | first_name |    city     |    state     | postal_code |    country
    ----+----------+------------+---------+------------+-------------+--------------+-------------+---------------
      1 |          |            | [email protected] | ab         | New York    | New York     |       12345 | United States
      2 |          |            | [email protected] | ac         | Buffalo     | New York     |       67891 | United States
      3 |          |            | [email protected] | ad         | Rochester   | New York     |       13579 | United States
      4 |          |            | [email protected] | ae         | Pittsburgh  | Pennsylvania |       98765 | United States
      5 |          |            | [email protected] | af         | Los Angeles | California   |       97531 | United States
    (5 rows)

    # select * from testapp_yearlytable;
     id | rotations | calories | distance | duration | year |            created            |           modified            | user_id
    ----+-----------+----------+----------+----------+------+-------------------------------+-------------------------------+---------
      1 |        10 |     10.8 |     12.5 |       20 | 2011 | 2016-05-17 16:23:46.39941+00  | 2016-05-17 16:23:46.399445+00 |       1
      3 |         8 |     11.8 |     11.5 |       30 | 2012 | 2016-05-17 16:24:26.264569+00 | 2016-05-17 16:24:26.264606+00 |       2
      4 |        20 |     15.8 |     13.5 |       40 | 2013 | 2016-05-17 16:24:51.200739+00 | 2016-05-17 16:24:51.200785+00 |       3
      5 |        30 |     10.2 |     12.5 |       40 | 2012 | 2016-05-17 16:25:08.187799+00 | 2016-05-17 16:25:08.187852+00 |       4
      6 |        10 |     14.8 |     13.5 |       10 | 2010 | 2016-05-17 16:25:24.846284+00 | 2016-05-17 16:25:24.846324+00 |       5
    (5 rows)


    # SELECT
    testapp_user.state,
    sum(testapp_yearlytable.rotations) as sum_rotations,
    sum(testapp_yearlytable.calories) as sum_calories,
    sum(testapp_yearlytable.distance) as sum_distance
    FROM testapp_yearlytable
    INNER JOIN  testapp_user on (testapp_yearlytable.user_id = testapp_user.id)
    WHERE testapp_yearlytable.user_id in
    (SELECT id FROM testapp_user
    WHERE country LIKE 'United States%' and
    NOT ("email" LIKE '%[email protected]%'))
    GROUP BY testapp_user.state;

        state     | sum_rotations | sum_calories | sum_distance
    --------------+---------------+--------------+--------------
     New York     |            28 |         27.6 |           25
     Pennsylvania |            30 |         10.2 |         12.5
     California   |            10 |         14.8 |         13.5

Running in python shell


    > python manage.py shell
    Python 2.7.6 (default, Jun 22 2015, 18:00:18)
    [GCC 4.8.2] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    (InteractiveConsole)
    >>> from testapp.models import User, YearlyTable
    >>> from django.db.models import Q, Sum
    >>> User.objects.filter(~Q(email__icontains='[email protected]'), country__startswith='United States') \
    ... .values('state') \
    ... .annotate(sum_rotations = Sum('yearlytable__rotations'), \
    ... sum_calories = Sum('yearlytable__calories'), \
    ... sum_distance = Sum('yearlytable__distance'))
    [{'sum_rotations': 28, 'state': u'New York', 'sum_calories': 27.6, 'sum_distance': 25.0}, {'sum_rotations': 30, 'state': u'Pennsylvania', 'sum_calories': 10.2, 'sum_distance': 12.5}, {'sum_rotations': 10, 'state': u'California', 'sum_calories': 14.8, 'sum_distance': 13.5}]

Upvotes: 3

miketheredherring
miketheredherring

Reputation: 21

It seems like this can be done using the Aggregation Framework with the following ORM query:

1) We filter on the User to find those which match the inner most SELECT statement. This is returns a list of the User.id.

2) values() is used first on the YearlyTable will perform the GROUP BY on User.state.

3) distinct() is used to ensure we only account for each possible User.state once.

4) annotate() is used to perform the Sum of the values you wanted.

5) Finally we call values() again to make dictionaries containing the information you requested in the top level SELECT query.

from django.db.models import Sum

YearlyTable.objects.filter(
    user_id__in=User.objects.filter(
        country__startswith='United States'
    ).exclude(
        email__contains='yopmail.com'
    ).values_list('id', flat=True)
).values('user__state').distinct().annotate(
    sum_rotations=Sum('rotations'),
    sum_calories=Sum('calories'),
    sum_distance=Sum('distance')
).values('user__state', 'sum_rotations', 'sum_calories', 'sum_distance')

Upvotes: 1

Related Questions