Al Nikolaj
Al Nikolaj

Reputation: 315

How to limit results of subquery in SQLAlchemy?

Im having some difficulties to get this query right. The gist of the Query is: for each question, to retrieve the highest guess together with the guess_id.

subquery = db.session\
    .query(Guess, db.func.max(Guess.amount).label("highest_amount"))\
    .group_by(Guess.id, Guess.amount)\
    .order_by(Guess.amount.desc())\
    .limit(1)\
    .subquery()

query = db.session\
    .query(Question, subquery.c.id, subquery.c.highest_amount)\
    .outerjoin(subquery, Question.id == subquery.c.question_id)\
    .all()

Now, without ".limit(1)", I get all guesses for each question, but I just need the highest guess. If I add limit(1) to the subquery, I get "None" instead and I am unsure why.

Any suggestions? Thanks!

Upvotes: 0

Views: 673

Answers (1)

snakecharmerb
snakecharmerb

Reputation: 55670

This seems to work: it handles questions with no answers and will return all matching guesses in the event of a tie.

subquery = (
    db.session.query(
        Guess.question_id,
        db.func.max(Guess.amount).label('highest_amount'),
    )
    .group_by(Guess.question_id)
    .subquery()
)

query = (
    db.session.query(Question.id, Guess.id, subquery.c.highest_amount)
    .outerjoin(subquery, Question.id == subquery.c.question_id)
    .outerjoin(
        Guess,
        db.and_(
            Guess.question_id == subquery.c.question_id,
            Guess.amount == subquery.c.highest_amount,
        ),
    )
    .order_by(Question.id)
)

for question_id, guess_id, highest_amount in query:
    print(question_id, guess_id, highest_amount)

The approximately equivalent SQL is

select q.id as question_id,
       subq.amount 
  from questions q 
  left outer join (
      select question_id,
             max(amount) as amount
      from guesses
      group by question_id
  ) subq
  on (q.id = subq.question_id)
  left outer join guesses g
  on (g.question_id = subq.question_id and g.amount = subq.amount)
  order by q.id

We don't need a LIMIT in the subquery because the result will contain only one row per question.

Using the rank() may result in more readable code, if your database supports window functions.


subquery = (
    db.session.query(
        Guess.question_id,
        Guess.id,
        Guess.amount,
        db.func.rank()
        .over(partition_by=Guess.question_id, order_by=Guess.amount.desc())
        .label('rank'),
    )
).subquery()

query = (
    db.session.query(Question.id, subquery.c.id, subquery.c.amount)
    .outerjoin(subquery, db.and_(Question.id == subquery.c.question_id, subquery.c.rank == 1))
    .order_by(Question.id)
)

for qid, gid, amount in query:
    print(qid, gid, amount)

The equivalent SQL (on PostgreSQL):

select q.name, subq.guess_id, subq.amount
  from questions q
  left outer join (
    select * 
    from (
        select g.question_id as question_id,
            g.id as guess_id,
            g.amount as amount,
            rank() over (partition by g.question_id order by g.amount desc) as "rank"
        from guesses g
    ) t
    where rank = 1
  ) subq
  on q.id = subq.question_id
  order by q.id;

Upvotes: 1

Related Questions