Eagerly loading a circular association

Question

I following is a minimal example of a many-to-many association. My goal is to load a single record of X and also eagerly load the instances of Y which are in the record's ys list as well as the instances of X which are in any of those instance's xs list.

class X(db.Model):
    __tablename__ = 'x'
    xid = db.Column(db.Integer, primary_key=True)
    ys = relationship('Z', back_populates='x', lazy='joined')


class Y(db.Model):
    __tablename__ = 'y'
    yid = db.Column(db.Integer, primary_key=True)
    xs = relationship('Z', back_populates='y', lazy='joined')


class Z(db.Model):
    __tablename__ = 'z'
    xid = db.Column(db.Integer, db.ForeignKey('x.xid'), primary_key=True)
    yid = db.Column(db.Integer, db.ForeignKey('y.yid'), primary_key=True)
    x = relationship('X', back_populates='ys', lazy='joined')
    y = relationship('Y', back_populates='xs', lazy='joined')

My goal is to produce the following result:

expected = [{
    'xid': 1,
    'ys': [
        {'yid': 101, 'xs': [{'xid': 1}, {'xid': 2}, {'xid': 3}]},
        {'yid': 102, 'xs': [{'xid': 1}, {'xid': 2}]},
        {'yid': 104, 'xs': [{'xid': 1}, {'xid': 4}]},
    ],
}]

The SQL statement to achieve this is fairly straightforward:

SELECT x.xid, y.yid, x2.xid FROM x
JOIN z       ON z.xid  = x.xid JOIN y       ON z.yid  = y.yid  ; Fetch Ys
JOIN z as z2 ON z2.yid = y.yid JOIN x as x2 ON z2.xid = x2.xid ; Fetch Xs (depth 2)
WHERE x.xid = 1

My issue is determining how to create a SQLAlchemy query that will either (a) allow me to execute this raw query and map it correctly to the right model instances, or (b) massage the query (with some combination of join and contains_eager calls) so that it knows how to actually use the joins that it's generating so that it doesn't explode into n+1 queries.

The proper query is generated by the following, but I cannot manage to get the depth 2 X instances to load from this query (the data is loaded lazily by a second select).

a = aliased(Z)
b = aliased(X)
q = X.query.filter(X.xid==1).join(X.ys).join(Z.y).join(a, Y.xs).join(b, Z.x)

univerio · Accepted Answer

The way the eager-loading mechanism works is, you need to specify a path to the relationship you want to load as well as how you want to load it. The path is basically which relationships to follow, in sequence, in order to find the relationship that you want. In your particular example, the correct thing to do is this:

q = session.query(X).filter(X.xid == 1) \
           .join(X.ys) \
           .join(Z.y) \
           .join(a, Y.xs) \
           .join(b, Z.x) \
           .options(
               contains_eager(X.ys),
               contains_eager(X.ys, Z.y),
               contains_eager(X.ys, Z.y, Y.xs, alias=a),
               contains_eager(X.ys, Z.y, Y.xs, Z.x, alias=b),
           )

Each contains_eager specifies a load on a single relationship, with the path (X.ys, Z.y, Y.xs, Z.x) specifying where the relationship is, and the contains_eager as well as alias specifying how to load the relationship. This is quite verbose, but fortunately SQLAlchemy provides a shortcut that lets you chain them together, like this:

.options(contains_eager(X.ys).contains_eager(Z.y).contains_eager(Y.xs, alias=a).contains_eager(Z.x, alias=b))

If you're using .join for the explicit goal of then doing contains_eager, you might as well just use joinedload instead:

q = session.query(X).filter(X.xid==1) \
           .options(joinedload(X.ys).joinedload(Z.y).joinedload(Y.xs).joinedload(Z.x))

In your particular case, joining like this may not be efficient if your branch factor is high, i.e. if your X.ys and Y.xs contain a maximum of n entries, then your database has to send you n^2 copies of every single row in X. For this reason, subqueryload is often the right choice for one-to-many relationships (this is not always the case; the trade off is between the number of queries, i.e. latency, against the amount of data in each query, i.e. throughput, so profile to find out):

q = session.query(X).filter(X.xid==1) \
           .options(subqueryload(X.ys).joinedload(Z.y).subqueryload(Y.xs).joinedload(Z.x))

Finally, if all you want is a many-to-many relationship, why not just configure a many-to-many relationship in the first place?

Eagerly loading a circular association

Answers (1)

Related Questions