Reputation: 3666
I have an SQL query in PostgreSQL
that I'd like to translate to Django.
select * from main_document
where id in (
select distinct on (document_id) document_id
from main_tokenindex
where token in('token1', 'token2')
order by document_id, relevance desc
) LIMIT 100
I have 2 tables: Document and TokenIndex. 1 to many relationship, a token can be in many documents.
I have this so far:
terms = []
ids = [doc.document_id for doc in TokenIndex.objects.filter(token__in = terms).
distinct('document__id').order_by("-relevance")]
list(Document.objects.filter(pk__in=ids))[:max_res]
As you can see, the problem is that I'm going to the database to get the list of ids, and then going back again to get the documents. This is inefficient because I might be dealing with millions of document ids, whereas I'm only interested in a small subset (defined by the max_res
variable, and LIMIT
in the SQL.
How do I translate the SQL query to Django? I want the Django's query to be like the one I wrote by hand in the sense that it only returns 100 documents for example and not 1.000.000 document ids and then 100 documents.
Upvotes: 0
Views: 516
Reputation: 31951
result = Document.objects.filter(pk__in=(TokenIndex.objects.filter(token__in=terms).distinct('document').order_by('document', '-relevance').values_list('document', flat=True)[:max_res]))
If you don't want 1.000.000 document ids, just return 100, you need LIMIT
in inner query, not in outer query.
Anyway, I think if you even use LIMIT
with outer query it will still translate to the SQL you want.
Upvotes: 2