Hungry Panda
Hungry Panda

Reputation: 31

SQLAlchemy - subquery in a SELECT clause

I would like to know the best way to achieve the subquery in SELECT clause using SQLAlchemy

Here is my database table.

mysql> select * from feedback_question;
+----+---------------------+-------------+--------------+
| id | created             | question_id | feedbacktype |
+----+---------------------+-------------+--------------+
|  1 | 2022-03-31 09:53:14 |         488 |            1 |
|  2 | 2022-03-31 09:53:21 |         508 |            1 |
|  3 | 2022-03-31 09:53:27 |         607 |            2 |
|  4 | 2022-03-31 09:53:31 |         606 |            1 |
|  5 | 2022-03-31 09:53:33 |         608 |            2 |
|  6 | 2022-03-31 09:55:30 |         608 |            2 |
|  7 | 2022-03-31 09:55:33 |         606 |            1 |
|  8 | 2022-03-31 09:55:40 |         607 |            1 |
|  9 | 2022-03-31 09:58:40 |         607 |            2 |
| 10 | 2022-03-31 09:59:04 |         607 |            2 |
+----+---------------------+-------------+--------------+
10 rows in set (0.00 sec)

Here is the actual query that I would like to define using SQLAlchemy.

mysql> select T.question_id, 
              T.feedbacktype, 
              count(T.feedbacktype) AS "count", 
              round( count(T.feedbacktype) * 100 / 
                         (select count(U.feedbacktype) from feedback_question as U where U.question_id=T.question_id)
              ) AS "countpercent" 
        from feedback_question as T   group by question_id, feedbacktype;
+-------------+--------------+-------+--------------+
| question_id | feedbacktype | count | countpercent |
+-------------+--------------+-------+--------------+
|         488 |            1 |     1 |          100 |
|         508 |            1 |     1 |          100 |
|         606 |            1 |     2 |          100 |
|         607 |            1 |     1 |           25 |
|         607 |            2 |     3 |           75 |
|         608 |            2 |     2 |          100 |
+-------------+--------------+-------+--------------+
6 rows in set (0.00 sec)

Here is how my current statement looks like:

dbsession.query( cls.question_id, cls.feedbacktype, func.count(cls.feedbacktype).label("count"), XXXXXXXXXX.label("countpercent") ).filter(cls.question_id.in_(question_ids)).group_by(cls.question_id, cls.feedbacktype).all()

I am using SQLAlchemy 1.4 and MySQL 5.7

I reviewed the SQLAlchemy documentation about subqueries and scalars, but I am not sure how to apply them in this case.

Thanks!

Upvotes: 3

Views: 3790

Answers (1)

Ian Wilson
Ian Wilson

Reputation: 9119

You will need to use scalar_subquery and aliased. This example is using postgresql but mysql should work. The output is not ordered as it is in your example. There is even more information in the correlated subquery section.

code

import sys
from sqlalchemy import (
    create_engine,
    Integer,
    DateTime
)
from sqlalchemy.schema import (
    Column,
)
from sqlalchemy.sql import select, func
from sqlalchemy.orm import declarative_base, Session, aliased


Base = declarative_base()


username, password, db = sys.argv[1:4]


engine = create_engine(f"postgresql+psycopg2://{username}:{password}@/{db}", echo=False)


class FeedbackQuestion(Base):
    __tablename__ = "feedback_question"
    id = Column(Integer, primary_key=True)
    created_at = Column(DateTime)
    question_id = Column(Integer, nullable=False)
    feedbacktype = Column(Integer, nullable=False)


Base.metadata.create_all(engine)

def extract(line):
    id, created_at, question_id, feedbacktype = line.split('|')[1:-1]
    return dict(id=id, created_at=created_at, question_id=question_id, feedbacktype=feedbacktype)

lines = [extract(line) for line in """|  1 | 2022-03-31 09:53:14 |         488 |            1 |
|  2 | 2022-03-31 09:53:21 |         508 |            1 |
|  3 | 2022-03-31 09:53:27 |         607 |            2 |
|  4 | 2022-03-31 09:53:31 |         606 |            1 |
|  5 | 2022-03-31 09:53:33 |         608 |            2 |
|  6 | 2022-03-31 09:55:30 |         608 |            2 |
|  7 | 2022-03-31 09:55:33 |         606 |            1 |
|  8 | 2022-03-31 09:55:40 |         607 |            1 |
|  9 | 2022-03-31 09:58:40 |         607 |            2 |
| 10 | 2022-03-31 09:59:04 |         607 |            2 |
""".splitlines() if line.strip()]


with Session(engine) as session, session.begin():
    session.add_all([FeedbackQuestion(**line) for line in lines])


with Session(engine) as session, session.begin():
    T = aliased(FeedbackQuestion, name="T")
    U = aliased(FeedbackQuestion, name="U")
    denom = select(func.count(U.feedbacktype)).where(U.question_id == T.question_id).scalar_subquery()
    q = select(T.question_id, T.feedbacktype, func.count(T.feedbacktype).label("count"), func.round((func.count(T.feedbacktype) * 100) / denom).label("countpercent")).group_by(T.question_id, T.feedbacktype)
    print (q)
    for result in session.execute(q):
        print(result)

query

SELECT "T".question_id, "T".feedbacktype, count("T".feedbacktype) AS count, round((count("T".feedbacktype) * :count_1) / (SELECT count("U".feedbacktype) AS count_2 
FROM feedback_question AS "U" 
WHERE "U".question_id = "T".question_id)) AS countpercent 
FROM feedback_question AS "T" GROUP BY "T".question_id, "T".feedbacktype

output

(606, 1, 2, 100.0)
(608, 2, 2, 100.0)
(488, 1, 1, 100.0)
(607, 1, 1, 25.0)
(607, 2, 3, 75.0)
(508, 1, 1, 100.0)

Upvotes: 4

Related Questions