user3986580
user3986580

Reputation:

sqlalchemy filtering by model which has multiple relation references

For example, I have Parcel model, which has sender and receiver, both are Subjects. I'm trying to get parcels from particular sender. I don't want to use Parcel.sender.has(), because of performance, my real table is too big.

From docs:

Because has() uses a correlated subquery, its performance is not nearly as good when compared against large target tables as that of using a join.

Here is a full paste-and-run example:

from sqlalchemy import create_engine, Column, Integer, Text, ForeignKey
from sqlalchemy.orm import sessionmaker, relationship
from sqlalchemy.ext.declarative.api import declarative_base
from sqlalchemy.orm.util import aliased

engine = create_engine('sqlite://')
Session = sessionmaker(bind=engine)
s = Session()

Base = declarative_base()


class Subject(Base):
    __tablename__ = 'subject'

    id = Column(Integer, primary_key=True)
    name = Column(Text)


class Parcel(Base):
    __tablename__ = 'parcel'

    id = Column(Integer, primary_key=True)
    sender_id = Column(Integer, ForeignKey('subject.id'))
    receiver_id = Column(Integer, ForeignKey('subject.id'))

    sender = relationship('Subject', foreign_keys=[sender_id], uselist=False, lazy='joined')
    receiver = relationship('Subject', foreign_keys=[receiver_id], uselist=False, lazy='joined')

    def __repr__(self):
        return '<Parcel #{id} {s} -> {r}>'.format(id=self.id, s=self.sender.name, r=self.receiver.name)


# filling database
Base.metadata.create_all(engine)
p = Parcel()
p.sender, p.receiver = Subject(name='Bob'), Subject(name='Alice')
s.add(p)
s.flush()


#
# Method #1 - using `has` method - working but slow
print(s.query(Parcel).filter(Parcel.sender.has(name='Bob')).all())

So, i tried to join and filter by aliased relationship, which raised an error:

#
# Method #2 - using aliased joining - doesn't work
# I'm getting next error: 
#
# sqlalchemy.exc.InvalidRequestError: Could not find a FROM clause to join from.  
# Tried joining to <AliasedClass at 0x7f24b7adef98; Subject>, but got: 
# Can't determine join between 'parcel' and '%(139795676758928 subject)s'; 
# tables have more than one foreign key constraint relationship between them. 
# Please specify the 'onclause' of this join explicitly.
#
sender = aliased(Parcel.sender)
print(s.query(Parcel).join(sender).filter(sender.name == 'Bob').all())

I've found that if I specify Model with join condition instead of relation, it'll work. But final SQL query was'nt what I expect:

print(
    s.query(Parcel)\
    .join(Subject, Parcel.sender_id == Subject.id)\
    .filter(Subject.name == 'Bob')
)

produces next SQL query:

SELECT parcel.id AS parcel_id,
       parcel.sender_id AS parcel_sender_id,
       parcel.receiver_id AS parcel_receiver_id,
       subject_1.id AS subject_1_id,
       subject_1.name AS subject_1_name,
       subject_2.id AS subject_2_id,
       subject_2.name AS subject_2_name
FROM parcel
JOIN subject ON parcel.sender_id = subject.id
LEFT OUTER JOIN subject AS subject_1 ON subject_1.id = parcel.sender_id
LEFT OUTER JOIN subject AS subject_2 ON subject_2.id = parcel.receiver_id
WHERE subject.name = ?

Here you can see that the subject table is being joined three times instead of two. That's because both sender and receiver relations are configured to load joined. And third join is the subject, by which I'm filtering.

I expect that final query will look like this:

SELECT parcel.id AS parcel_id,
       parcel.sender_id AS parcel_sender_id,
       parcel.receiver_id AS parcel_receiver_id,
       subject_1.id AS subject_1_id,
       subject_1.name AS subject_1_name,
       subject_2.id AS subject_2_id,
       subject_2.name AS subject_2_name
FROM parcel
LEFT OUTER JOIN subject AS subject_1 ON subject_1.id = parcel.sender_id
LEFT OUTER JOIN subject AS subject_2 ON subject_2.id = parcel.receiver_id
WHERE subject_1.name = ?

I believe that filtering by multiple referenced relations shouldn't be so unclear and there is better and clearer ways to do it. Please help me find it.

Upvotes: 2

Views: 2679

Answers (1)

leovp
leovp

Reputation: 4758

You have configured it in such a way that sender and reciever will always by join loaded.
You can change it and do a joinedload by hand, when you actually need both of them loaded at the same time with a join.

If you prefer to leave the definitions as they are, you can just "help" SQLAlchemy and point out that the query already has all the data for this comparison and there is no need for an additional join. For this, contains_eager option is used.

The modified query:

q = (s.query(Parcel)
     .join(Parcel.sender)
     .options(contains_eager(Parcel.sender))
     .filter(Subject.name == 'Bob'))

And SQL it produces:

SELECT subject.id AS subject_id,
       subject.name AS subject_name,
       parcel.id AS parcel_id,
       parcel.sender_id AS parcel_sender_id,
       parcel.receiver_id AS parcel_receiver_id,
       subject_1.id AS subject_1_id,
       subject_1.name AS subject_1_name
FROM parcel
JOIN subject ON subject.id = parcel.sender_id
LEFT OUTER JOIN subject AS subject_1 ON subject_1.id = parcel.receiver_id
WHERE subject.name = ?

Upvotes: 1

Related Questions