Savir
Savir

Reputation: 18418

Load `relationship` with subset of items in SqlAlchemy (similar to Django's `Prefetch` object)

Is there a way to retrieve a subset of elements from a relationship in SqlAlchemy but keeping the order I defined in the backref (relationship) when the class was created?

Let's say I have two tables/models: A User and a UserLog which contains actions performed by the user (the Base is just a declarative_base())

Each UserLog has a ForeignKey to the ID of the User that performed the action, and the type of action for each log can be chosen from an Enum. Each UserLog also has a created timestamp I order with: On a regular scenario, I'd like to get the logs from oldest to newest (oldest first).

However, I'd also like to be able to get a particular user with only a subset of Logs (logs of a particular action) loaded. I understand this goes agains the "gist" of the joined load, since I would be getting an unrealistic state of the database (there are more logs linked to the user that my fetched User object would reflect), but it would be helpful in certain occasions.

class UserLog(Base):
    __tablename__ = 'user_logs'
    id = Column(UUID(as_uuid=True), primary_key=True)
    timestamp_created = Column(
        DateTime, server_default=sqlalchemy.text("now()")
    )
    user_id = Column(
        UUID(as_uuid=True),
        ForeignKey("users.id", ondelete="CASCADE")
    )
    action = Column(SaEnum(UserLogsConstants), nullable=False)

    user = relationship(
        "User", 
        backref=backref(
          'logs', order_by="asc(UserLog.timestamp_created)"
        ) # The order_by is important here
    )

Thanks to this SO answer I've been able to load a subset of logs using contains_eager, but then I lose the ordering by timestamp_created that I stated in the backref.

Let's say I have two available actions for the logs: UPLOAD_START, UPLOAD_END (for the action column)

I've set up a test in which I create a User with 5 UPLOAD_START actions and 5 UPLOAD_END actions created out of order (I'm artificially inserting "newer" actions first).

So what I'd like is getting a User with its .logs relationship containing only logs whose event is UPLOAD_START and keep the order by timestamp specified in the backref object.

Here's the test query

u = session.query(User) \
    .filter(User.id == test_user_id) \
    .join(User.logs) \
    .filter(UserLog.action == UserLogsConstants.UPLOAD_START) \
    .options(contains_eager(User.logs)) \
    .one()

These two asserts work fine:

assert len(u.logs) == 5  
# I inserted 10 logs total, only 5 with action=UPLOAD_START

assert all(ul.action == UserLogsConstants.UPLOAD_START for ul in u.logs)
# Neat, only logs with `UPLOAD_START`

But this one fails: the ordering is not respected.

assert all(
    u.logs[i].timestamp_created < u.logs[i + 1].timestamp_created
    for i in range(len(u.logs) - 1)
)

That makes sense. If I understand the contains_eager behavior, this is kind of a forget what you think the relationship is, and use what I'm telling you in the preceding query (and I'm not saying anything about the order in that query). And, indeeed: I'm looking at the SQL generated by SqlAlchemy and there's no ORDER BY clause. I suppose I could always add it myself to the query, but if would be cleaner if I didn't need to.

For those familiar with Django, I'm kind of trying to emulate the Prefech object, where I can specify a subquery to get the loaded objects from:

User.objects.prefetch_related(
     Prefetch(
         'logs', 
         queryset=UserLogs.objects.filter(action=UserLogsConstants.UPLOAD_START)
     )
).get(pk='foo-uuid')

Upvotes: 0

Views: 2100

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1122022

When using contains_eager, you tell SQLAlchemy to ignore the configured relationship and instead fill the .logs container from your hand-crafted query. That means any ordering set on the relationship is also ignored.

So if you need the log data to be ordered, you need to do so yourself:

u = (
    session.query(User)
        .filter(User.id == test_user_id)
        .join(User.logs)
        .filter(UserLog.action == UserLogsConstants.UPLOAD_START)
        .order_by(User.id, asc(UserLog.timestamp_created))
        .options(contains_eager(User.logs))
        .one()
)

If you are more commonly filtering user logs, then another option is to make logs a dynamic relationship. This replaces the container with a pre-populated Query object you'd filter on:

user = relationship(
    "User", 
    backref=backref(
      'logs', lazy='dynamic', order_by="asc(UserLog.timestamp_created)"
    )
)

and use

u = session.query(User).filter(User.id == test_user_id).one()
upload_start_logs = u.logs.filter(UserLog.action == UserLogsConstants.UPLOAD_START).all()

Of course, this does issue a separate query.

Upvotes: 5

Related Questions