Reputation: 9800
I have an sqlalchemy query like this:
E = aliased(Entity, name='e')
db.session.query(E) \
.filter(E.user_id == user_id) \
.filter(
db.session.query(Entity) \
.distinct(Entity.original_id) \
.order_by(Entity.original_id, Entity.id.desc())
.filter(Entity.ref_id == E.ref_id) \
.from_self(Entity.flag) \
.order_by(Entity.timestamp.desc()) \
.limit(1).as_scalar()
) \
.order_by(E.timestamp) \
.all()
It produces (roughly) the following SQL:
SELECT *
FROM entity AS e
WHERE e.user_id = 1 AND (
SELECT anon_1.entity_flag AS flag
FROM (
SELECT DISTINCT ON (entity.original_id)
entity.flag AS entity_flag, entity.timestamp AS entity_timestamp
FROM entity, entity AS e # THIS "entity AS e" SHOULD NOT BE HERE
WHERE
entity.ref_id = e.ref_id
ORDER BY entity.original_id, entity.id DESC
) AS anon_1
ORDER BY anon_1.entity_timestamp
LIMIT 1
) ORDER BY e.timestamp;
Because it somehow adds entity AS e
into the inner query FROM
clause it causes the WHERE entity.ref_id = e.ref_id
not to reference the outer table as it should.
Why is this extraneous entity AS e
added to the inner FROM
clause and how to overcome it?
Upvotes: 1
Views: 2550
Reputation: 52939
It might be that you've found the limits of Query.from_self()
:
The automatic aliasing feature only works in a limited way, for simple filters and orderings. More ambitious constructions such as referring to the entity in joins should prefer to use explicit subquery objects, typically making use of the
Query.subquery()
method to produce an explicit subquery object. Always test the structure of queries by viewing the SQL to ensure a particular structure does what’s expected!
In general SQLAlchemy's "auto-correlation" considers FROM objects for correlation from the immediate enclosing query only and if deeper levels of nesting are required, explicit correlation must be used.
On the other hand, in your case this does not help. Adding a call to Query.correlate()
does not break through the boundary of Query.from_self()
for some reason, though using them together is even mentioned in the documentation:
The correlation arguments take effect in such cases as when
Query.from_self()
is used, or when a subquery as returned byQuery.subquery()
is embedded in anotherselect()
construct.
A solution is to use explicit subqueries:
In [65]: sq = session.query(Entity.flag, Entity.timestamp).\
...: distinct(Entity.original_id).\
...: order_by(Entity.original_id, Entity.id.desc()).\
...: filter(Entity.ref_id == E.ref_id).\
...: correlate(E).\
...: subquery()
In [66]: q = session.query(E).\
...: filter(E.user_id == 123).\
...: filter(session.query(sq.c.flag).
...: order_by(sq.c.timestamp.desc()).
...: limit(1).
...: as_scalar()).\
...: order_by(E.timestamp)
...:
In [67]: print(q.statement.compile(dialect=postgresql.dialect()))
SELECT e.id, e.ref_id, e.timestamp, e.original_id, e.user_id, e.flag
FROM entity AS e
WHERE e.user_id = %(user_id_1)s AND (SELECT anon_1.flag
FROM (SELECT DISTINCT ON (entity.original_id) entity.flag AS flag, entity.timestamp AS timestamp
FROM entity
WHERE entity.ref_id = e.ref_id ORDER BY entity.original_id, entity.id DESC) AS anon_1 ORDER BY anon_1.timestamp DESC
LIMIT %(param_1)s) ORDER BY e.timestamp
Upvotes: 5