HTF
HTF

Reputation: 7280

Django ORM query, distinct values and join the same table

I would like to return all columns based on distinct values from column site where hide = 0 and order it by date from created. I'm aware that distinct() call with specified field names is currently supported only by PostgresSQL but I'm running MySQL. I've got a working SQL query (it's probably not very efficient) but not sure how to convert it to Django ORM.

models.py

from django.db import models
from django.utils import timezone

# Create your models here.

class Results(models.Model):
    user_ip = models.GenericIPAddressField(unpack_ipv4=True)
    site_ip = models.GenericIPAddressField(unpack_ipv4=True)
    site = models.URLField()
    reason = models.CharField(max_length=50)
    hide = models.BooleanField(default=False)
    created = models.DateTimeField(default=timezone.now)

    def __str__(self):
        return self.site

Table structure:

mysql > SHOW CREATE TABLE results\G
*************************** 1. row ***************************
       Table: results
Create Table: CREATE TABLE `results` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `user_ip` char(39) NOT NULL,
  `site_ip` char(39) NOT NULL,
  `site` varchar(200) NOT NULL,
  `reason` varchar(50) NOT NULL,
  `hide` tinyint(1) NOT NULL,
  `created` datetime(6) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=836 DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

SQL query:

SELECT * FROM 
(
    SELECT site, MAX(created) created 
    FROM results 
    GROUP BY site 
    ORDER BY MAX(created) DESC 
    LIMIT 10
) _d 
JOIN results USING (site, created) 
ORDER BY _d.created DESC

Upvotes: 2

Views: 2039

Answers (2)

HTF
HTF

Reputation: 7280

I've got a workaround but I was wondering if there is a better way to do it because this requires two database hits:

views.py

recent_results_ids = []
[recent_results_ids.append(i.id) for i in Results.objects.raw('SELECT MAX(id) id FROM results WHERE hide = 0 GROUP BY site ORDER BY MAX(created) DESC LIMIT 10')]
recent_results = Results.objects.filter(id__in=recent_results_ids).order_by('-id')

Upvotes: 0

Risadinha
Risadinha

Reputation: 16666

There is module that allows you to group Django models and still work with a QuerySet in the result: https://github.com/kako-nawao/django-group-by

For example:

from django_group_by import GroupByMixin

class ResultsQuerySet(QuerySet, GroupByMixin):
    pass

class Results(Model):
    # your model

class GroupedResultsListView(ListView):
    template_name = 'xxx/results.html'
    model = Results

    def get_queryset(self):
        return Results.objects.group_by('site').annotate(
            max_created=Max('created')).order_by(
            'created').distinct()
        # order by 'max_created' might also work

'xxx/results.html'

<ul>
{% for result in object_list %}
    <li>
        <h2>{{ result.site }}</td>
        <p>{{ result.max_created }}</p>
    </li>
{% endfor %}
</ul>

The difference to the annotate/aggregate basic Django queries is the use of the attributes of a related field, e.g. result.site. You can also group by several attributes in one query, e.g.

Results.objects.group_by('site', 'user_ip')

If you need the PKs of the instances that have been grouped together, add the following annotation:

.annotate(pks=ArrayAgg('id'))

NOTE: ArrayAgg is a Postgres specific function, available from Django 1.9 onwards: https://docs.djangoproject.com/en/1.10/ref/contrib/postgres/aggregates/#arrayagg

Upvotes: 1

Related Questions