Gregg
Gregg

Reputation: 3316

Eager loading hierarchical children with explicit self-joins and contains_eager in SQLAlchemy

Given the following relationships:

- 1 MasterProduct parent -> many MasterProduct children
- 1 MasterProduct child -> many StoreProducts
- 1 StoreProduct -> 1 Store

I have defined the following declarative models in SQLAlchemy:

class MasterProduct(Base):
    __tablename__ = 'master_products'
    id = Column(Integer, primary_key=True)
    pid = Column(Integer, ForeignKey('master_products.id'))
    children = relationship('MasterProduct', join_depth=1,
                            backref=backref('parent', remote_side=[id]))
    store_products = relationship('StoreProduct', backref='master_product')


class StoreProduct(Base):
    __tablename__ = 'store_products'
    id = Column(Integer, primary_key=True)
    mid = Column(Integer, ForeignKey('master_products.id'))
    sid = Column(Integer, ForeignKey('stores.id'))
    timestamp = Column(DateTime)
    store = relationship('Store', uselist=False)


class Store(Base):
    __tablename__ = 'stores'
    id = Column(Integer, primary_key=True)

My goal is to replicate the following query in SQLAlchemy with eager loading:

SELECT *
FROM master_products mp_parent
INNER JOIN master_products mp_child ON mp_child.pid = mp_parent.id
INNER JOIN store_products sp1 ON sp1.mid = mp_child.id
LEFT JOIN store_products sp2
  ON sp1.mid = sp2.mid AND sp1.sid = sp2.sid AND sp1.timestamp < sp2.timestamp
WHERE mp_parent.id = 6752 AND sp2.id IS NULL

The query selects all MasterProduct children for parent 6752 and all corresponding store products grouped by most recent timestamp using a NULL self-join (greatest-n-per-group). There are 82 store products returned from the query, with 14 master product children.

I've tried the following to no avail:

mp_child = aliased(MasterProduct)
sp1 = aliased(StoreProduct)
sp2 = aliased(StoreProduct)

q = db.session.query(MasterProduct).filter_by(id=6752) \ 
    .join(mp_child, MasterProduct.children) \
    .join(sp1, mp_child.store_products) \
    .outerjoin(sp2, and_(sp1.mid == sp2.mid, sp1.sid == sp2.sid, sp1.timestamp < sp2.timestamp)) \
    .filter(sp2.id == None) \
    .options(contains_eager(MasterProduct.children, alias=mp_child),
             contains_eager(MasterProduct.children, mp_child.store_products, alias=sp1))

>>> mp_parent = q.first()  # the query below looks ok!
SELECT <all columns from master_products, master_products_1, and store_products_1>
FROM master_products INNER JOIN master_products AS master_products_1 ON master_products.id = master_products_1.pid INNER JOIN store_products AS store_products_1 ON master_products_1.id = store_products_1.mid LEFT OUTER JOIN store_products AS store_products_2 ON store_products_1.mid = store_products_2.mid AND store_products_1.sid = store_products_2.sid AND store_products_1.timestamp < store_products_2.timestamp 
WHERE master_products.id = %s AND store_products_2.id IS NULL 
 LIMIT %s
>>> mp_parent.children  # only *one* child is eagerly loaded (expected 14)
[<app.models.MasterProduct object at 0x2463850>]
>>> mp_parent.children[0].id  # this is correct, 6762 is one of the children
6762L
>>> mp_parent.children[0].pid  # this is correct
6752L
>>> mp_parent.children[0].store_products  # only *one* store product is eagerly loaded (expected 7 for this child)
[<app.models.StoreProduct object at 0x24543d0>]

Taking a step back and simplifying the query to eagerly load just the children also results in only 1 child being eagerly loaded instead of all 14:

mp_child = aliased(MasterProduct)
q = db.session.query(MasterProduct).filter_by(id=6752) \ 
        .join(mp_child, MasterProduct.children)
        .options(contains_eager(MasterProduct.children, alias=mp_child))

However, when I use a joinedload, joinedload_all, or subqueryload, all 14 children are eagerly loaded, i.e.:

q = db.session.query(MasterProduct).filter_by(id=6752) \ 
        .options(joinedload_all('children.store_products', innerjoin=True))

So the problem seems to be populating MasterProduct.children from the explicit join using contains_eager.

Can anyone spot the error in my ways or help point me in the right direction?

Upvotes: 3

Views: 2302

Answers (1)

zzzeek
zzzeek

Reputation: 75127

OK what you might observe in the SQL is that there's a "LIMIT 1" coming out. That's because you're using first(). We can just compare the first two queries, the contains eager, and the joinedload:

join() + contains_eager():

SELECT master_products_1.id AS master_products_1_id, master_products_1.pid AS master_products_1_pid, master_products.id AS master_products_id, master_products.pid AS master_products_pid 
FROM master_products JOIN master_products AS master_products_1 ON master_products.id = master_products_1.pid 
WHERE master_products.id = ?
 LIMIT ? OFFSET ?

joinedload():

SELECT anon_1.master_products_id AS anon_1_master_products_id, anon_1.master_products_pid AS anon_1_master_products_pid, master_products_1.id AS master_products_1_id, master_products_1.pid AS master_products_1_pid 
FROM (SELECT master_products.id AS master_products_id, master_products.pid AS master_products_pid 
FROM master_products 
WHERE master_products.id = ?
 LIMIT ? OFFSET ?) AS anon_1 JOIN master_products AS master_products_1 ON anon_1.master_products_id = master_products_1.pid

you can see the second query is quite different; because first() means a LIMIT is applied, joinedload() knows to wrap the "criteria" query in a subquery, apply the limit to that, then apply the JOIN afterwards. In the join+contains_eager case, the LIMIT is applied to the collection itself and you get the wrong number of rows.

Just changing the script at the bottom to this:

for q, query_label in queries:
    mp_parent = q.all()[0]

I get the output it says you're expecting:

[explicit join with contains_eager] children=3, store_products=27
[joinedload] children=3, store_products=27
[joinedload_all] children=3, store_products=27
[subqueryload] children=3, store_products=27
[subqueryload_all] children=3, store_products=27
[explicit joins with contains_eager, filtered by left-join] children=3, store_products=9

(this is why getting a user-created example is so important)

Upvotes: 4

Related Questions