SQLAlchemy: Counting and ordering by multiple relationships to the same table

Question

I have a class in SQLAlchemy that has multiple relationships to the same secondary table. It looks somewhat like this:

class Job(Base):
    __tablename__ = 'jobs'
    id = Column(Integer, primary_key=True)
    tasks_queued = relationship("Task", lazy="dynamic",
        primaryjoin="(Task.state == 'queued') & (Task.job_id == Job.id)")
    tasks_running = relationship("Task", lazy="dynamic",
        primaryjoin="(Task.state == 'running') & (Task.job_id == Job.id)")
    tasks_done = relationship("Task", lazy="dynamic",
        primaryjoin="(Task.state == 'done') & (Task.job_id == Job.id)")
    tasks_failed = relationship("Task", lazy="dynamic",
        primaryjoin="(Task.state == 'failed') & (Task.job_id == Job.id)")

class Task(Base):
    __tablename__ = 'tasks'
    id = Column(Integer, primary_key=True)
    job_id = Column(Integer, ForeignKey("jobs.id"))
    state = Column(String(8), nullable=False, default='queued')
    job = relationship("Job")

A job has zero or more tasks. A task can have one of four states: "queued", "running", "done" or "failed". When querying jobs, I want to see the counts for those tasks split by states, i.e. how many queued, running, done and failed tasks respectively every job has. I also want to be able to sort the ouput by any of those counts.

After a bit of googling, I found out how to do that for one relationship:

session.query(Job, func.count(Job.tasks_queued).label("t_queued")).\
outerjoin(Job.tasks_queued).group_by(Job).order_by("t_queued ASC").all()

However, as soon as I try to extend that to more than one relationship, things start to get murky:

session.query(Job, func.count(Job.tasks_queued).label("t_queued"), 
    func.count(Job.tasks_running).label("t_running")).\
outerjoin(Job.tasks_queued).\
outerjoin(Job.tasks_running).group_by(Job).order_by("t_queued ASC").all()

produces this error:

sqlalchemy.exc.OperationalError: (OperationalError) ambiguous column name: tasks.state 'SELECT jobs.id AS jobs_id, count(tasks.state = ? AND tasks.job_id = jobs.id) AS t_queued, count(tasks.state = ? AND tasks.job_id = jobs.id) AS t_running 
FROM jobs LEFT OUTER JOIN tasks ON tasks.state = ? AND tasks.job_id = jobs.id LEFT OUTER JOIN tasks ON tasks.state = ? AND tasks.job_id = jobs.id GROUP BY jobs.id ORDER BY t_queued ASC' ('queued', 'running', 'queued', 'running')

So I somehow need to tell sqlalchemy that the first count refers to the first join and the second to the second join. In pure SQL, I would just give the joined tables ad-hoc aliases and then references those aliases instead of the table names in the count() function. How do I do that in SQLAlchemy?

van · Accepted Answer

Same way you can use aliases with sqlalchemy:

a_q = aliased(Task)
a_r = aliased(Task)
a_d = aliased(Task)
a_f = aliased(Task)
qry2 = (session.query(Job,
                      func.count(a_q.id.distinct()).label("t_queued"),
                      func.count(a_r.id.distinct()).label("t_running"),
                      func.count(a_d.id.distinct()).label("t_done"),
                      func.count(a_f.id.distinct()).label("t_failed"),
                      )
        .outerjoin(a_q, Job.tasks_queued)
        .outerjoin(a_r, Job.tasks_running)
        .outerjoin(a_d, Job.tasks_done)
        .outerjoin(a_f, Job.tasks_failed)
        .group_by(Job)
        .order_by("t_queued ASC")

I think that you need to add distinct to those counts though.

SQLAlchemy: Counting and ordering by multiple relationships to the same table

Answers (1)

Related Questions