jaap3
jaap3

Reputation: 2846

How to get a subquery in FROM clause in Django ORM

I'm trying to express the following (Postgres) SQL statement using the Django ORM:

SELECT 
    v.id, v.min_salary, v.max_salary, v.min_weekly_hours, v.max_weekly_hours
    p.min_start_date, p.max_end_date
FROM 
    vacancy v,
    (
        SELECT 
        id, vacancy_id, MIN(start_date) min_start_date, MAX(end_date) AS max_end_date 
        FROM vacancypublication
        WHERE (active = True AND site_id = 1 AND start_date <= CURRENT_TIMESTAMP) 
        GROUP BY id, vacancy_id
    ) p
WHERE  
    p.vacancy_id = v.id AND
    v.workflow_status = 'A'
ORDER BY p.min_start_date DESC;

The problem is that I'm using a subquery in the FROM clause (also known as a "inline-view").

I've tried using .extra(tables=['...']) but Django adds quotes to the statement, making the SQL invalid.

I'd rather not resort to a .raw query. Is there a way to do this? Maybe through a reusable app if the core API doesn't provide a way.

EDIT:

This is the (seemingly) equivalent statement using a join:

    SELECT
    v.id, v.code, v.min_salary, v.max_salary, v.min_weekly_hours, v.max_weekly_hours, v.application_email, v.application_url, v.available_positions,
    MIN(CASE WHEN (p.active = True AND p.site_id = 1 AND p.start_date <= CURRENT_TIMESTAMP) THEN p.start_date ELSE NULL END) AS start_date, 
    MAX(CASE WHEN (p.active = True AND p.site_id = 1) THEN p.end_date ELSE NULL END) AS end_date
FROM base_vacancy v 
LEFT OUTER JOIN 
    base_vacancypublication p ON v.id = p.vacancy_id
WHERE v.workflow_status = 'A'
GROUP BY v.id, v.code, v.min_salary, v.max_salary, v.min_weekly_hours, v.max_weekly_hours, v.application_email, v.application_url, v.available_positions
HAVING MIN(CASE WHEN (p.active = True AND p.site_id = 1 AND p.start_date <= CURRENT_TIMESTAMP) THEN p.start_date ELSE NULL END) IS NOT NULL 
ORDER BY start_date DESC;

It's about ~3 times as slow, but it's possible to write this using Django 1.9 ORM methods:

Vacancy.objects.annotate(
    start_date=Min(
        Case(
            When(publication_set__is_active=True, publication_set__site_id=1, publication_set__start_date__lte=Now(), then='publication_set__start_date'),
            default=None
        )
    ),
    end_date=Max(
        Case(
            When(publication_set__is_active=True, publication_set__site_id=1, then='publication_set__end_date'),
            default=None
        )
    )
).filter(
    start_date__isnull=False, status=Workflow.APPROVED
).order_by(
    '-start_date'
)

Upvotes: 3

Views: 991

Answers (2)

Nayem Jaman Tusher
Nayem Jaman Tusher

Reputation: 1078

you can construct the equivalent Django's ORM query:

from django.db.models import Q, F, Min, Max, Case, When
from django.db.models.functions import Now

approved_status = 'A'

# used annotate  with start_date and end_date
vacancies = Vacancy.objects.annotate(
    start_date=Min(
        Case(
            When(
                Q(publications__active=True, publications__site_id=1, publications__start_date__lte=Now()),
                then='publications__start_date'
            ),
            default=None
        )
    ),
    end_date=Max(
        Case(
            When(
                Q(publications__active=True, publications__site_id=1),
                then='publications__end_date'
            ),
            default=None
        )
    )
).filter(
    start_date__isnull=False, workflow_status=approved_status
).order_by('-start_date')

vacancies = vacancies.values(
    'id', 'min_salary', 'max_salary', 'min_weekly_hours', 'max_weekly_hours', 'start_date', 'end_date'
)

Upvotes: 0

What's the point of the aggregations being conditional if you are excluding them on the HAVING clause when they are NULL? Instead, you can simplify them adding the conditions to the WHERE clause (executed before aggregations)

Upvotes: 0

Related Questions