Ali
Ali

Reputation: 3666

How do I translate this SQL query to Django?

I have an SQL query in PostgreSQL that I'd like to translate to Django.

select * from main_document 
where id in (
    select distinct on (document_id) document_id 
    from main_tokenindex 
    where token in('token1', 'token2')
    order by document_id, relevance desc
) LIMIT 100

I have 2 tables: Document and TokenIndex. 1 to many relationship, a token can be in many documents.

I have this so far:

terms = []
ids = [doc.document_id for doc in TokenIndex.objects.filter(token__in = terms).
      distinct('document__id').order_by("-relevance")]

list(Document.objects.filter(pk__in=ids))[:max_res]

As you can see, the problem is that I'm going to the database to get the list of ids, and then going back again to get the documents. This is inefficient because I might be dealing with millions of document ids, whereas I'm only interested in a small subset (defined by the max_res variable, and LIMIT in the SQL.

How do I translate the SQL query to Django? I want the Django's query to be like the one I wrote by hand in the sense that it only returns 100 documents for example and not 1.000.000 document ids and then 100 documents.

Upvotes: 0

Views: 516

Answers (1)

DrTyrsa
DrTyrsa

Reputation: 31951

result = Document.objects.filter(pk__in=(TokenIndex.objects.filter(token__in=terms).distinct('document').order_by('document', '-relevance').values_list('document', flat=True)[:max_res]))

If you don't want 1.000.000 document ids, just return 100, you need LIMIT in inner query, not in outer query.

Anyway, I think if you even use LIMIT with outer query it will still translate to the SQL you want.

Upvotes: 2

Related Questions