How to load a collection with limited number in Sqlalchemy?

Question

I have two tables. With Sqlalchemy, I map them to two classes:

class A(base):
  ...
  id = Column(BigInteger, primary_key=True, autoincrement=True)

class B(base):
  ...
  id = Column(BigInteger, primary_key=True, autoincrement=True)
  a_id = Column(BigInteger, ForeignKey(A.id))
  timestamp = Column(DateTime)

  a = relationship(A, backref="b_s")

I can use A.b_s to get a collection of B objects whose foreign key are as same as the primary key of A. It's very easy to use lazy load or eager load. But now I have a question. I don't want to load all B objects. I only want to load the first N objects ordered by timestamp. That is to say, A.b_s only loads some of related B objects. How can I use Sqlalchemy to do it?

Thanks a lot!

van · Accepted Answer

What you want to achive will not work with relations (and this is not a SA limitation, rather the proper way to treat relations and take care of the referential integrity).
However a simple query (wrapped in a method) will do the trick just fine:

class A(Base):
    # ...
    def get_children(self, offset, count):
        # @todo: might need to handle some border cases
        qry = B.query.with_parent(self)
        #or: qry = object_session(self).query(B).with_parent(self)
        return qry[offset:offset+count]

my_a = session.query(A).get(a_id)
print my_a.get_children( 0, 10) # print first 10 children
print my_a.get_children(10, 10) # print second 10 children

edit-1: achieve that by having only 1-2 SQL statements
Now, to achieve this in only 1-2 SQL statements is definitely possible.
First of all, one needs a way to get the identifiers of B for top N of each A. For this we will use the sqlalchemy.sql.expression.over function to compose a subquery:

# @note: this is the subquery using *sqlalchemy.orm.over* function to limit number of rows
# this subquery is used for both queries below
# @note: the code below sorts Bs by id, but you can change it in order_by
subq = (session.query(
            B.__table__.c.id.label("b_id"), 
            over(func.row_number(), partition_by="a_id", order_by="id").label("rownum")
       ).subquery())
# this produces the following SQL (@note: the RDBMS should support the OVER...)
# >> SELECT b.id AS b_id, row_number() OVER (PARTITION BY a_id ORDER BY id) AS rownum FROM b

Version-1: Now, the first version will load As, the second will load Bs. The function returns dictionary with As as keys and list of Bs as values:

def get_A_with_Bs_in_batch(b_limit=10):
    """ 
    @return: dict(A, [list of top *b_limit* A.b_s])  
    @note: uses 2 SQL statements, but does not screw up relationship.
    @note: if the relationship is requested via a_instance.b_s, the new SQL statement will be
    issued to load *all* related objects
    """
    qry_a = session.query(A)
    qry_b = (session.query(B)
            .join(subq, and_(subq.c.b_id == B.id, subq.c.rownum <= b_limit))
            )
    a_s = qry_a.all()
    b_s = qry_b.all()
    res = dict((a, [b for b in b_s if b.a == a]) for a in a_s)
    return res

Version-2: this will trick SQLAlchemy to think that the TOP N Bs that are loaded in single query with A is actually A.b_s. VERY DANGEROUS, but neat. Read the comments in code which explain the pieces:

def get_A_with_Bs_hack_relation(b_limit=10):
    """ 
    @return: dict(A, [list of top *b_limit* A.b_s])
    @note: the Bs are loaded as relationship A.b_s, but with the limit.
    """
    qry = (session.query(A)
            .outerjoin(B)
            # @note: next line will trick SA to load joined Bs as if they were *all* objects
            # of relationship A.b_s. this is a @hack: and one should discard/reset a session after this
            # kind of hacky query!!!
            .options(contains_eager(A.b_s))
            .outerjoin(subq, and_(subq.c.b_id == B.id, subq.c.rownum <= b_limit))
            # @note: next line is required to make both *outerjoins* to play well together 
            # in order produce the right result
            .filter(or_(B.id == None, and_(B.id != None, subq.c.b_id != None)))
            )
    res = dict((a, a.b_s) for a in qry.all())
    return res

To summarize, the Version-2 is probably the most direct answer to your question. Use it at own risk, because here you are tricking SA big time and if you modify the relationship property in any way, you might experience "Kaboom!"

How to load a collection with limited number in Sqlalchemy?

Answers (1)

Related Questions