Reputation: 480
Suppose we have original generated query like that:
SELECT company.x AS company_x, ...
FROM company
LEFT OUTER JOIN acc ON acc.id = company.acc
LEFT OUTER JOIN usercomp_links ON company.id = usercomp_links.pid
LEFT OUTER JOIN usergro_links ON acc.id = usergro_links.pid
WHERE usergro_links.eid = %s OR usercomp_links.eid = %s
And if we add .options(subqueryload(Company.childs))
to this, we will get:
SELECT company.x AS company_x, ..., anon_1.company_id AS anon_1_company_id
FROM (
SELECT company.id AS company_id
FROM company
LEFT OUTER JOIN acc ON acc.id = company.acc
LEFT OUTER JOIN usercomp_links ON company.id = usercomp_links.pid
LEFT OUTER JOIN usergro_links ON acc.id = usergro_links.pid
WHERE usergro_links.eid = %s OR usercomp_links.eid = %s) AS anon_1
INNER JOIN acel_links AS acel_links_1 ON anon_1.company_id = acel_links_1.eid
INNER JOIN company ON company.id = acel_links_1.pid ORDER BY anon_1.company_id
And this is sloooow. If I will get company ids from first query, and load all child companies by hands, it will be blazingly fast in compare to what we get in this case.
I have read documentation, looked at the code, but dont see if I can tell sqlalchemy just get ids from results of first query and load childs in separate, comparatively simple query. I dont rely on this sample - I had more more harder situations, when sqlalchemy just cant load constructed query. And why to do all this work from first query one more time?
So anyone know how to eager load without auto constructed "join from join in join" style?
Upvotes: 12
Views: 14956
Reputation: 16172
Update: the "select in" strategy is now implemented in SQLAlchemy (since v 1.2): see Select IN loading in the documentation.
TLDR:
I think the joinedload
strategy should be used where possible, as it is more efficient than other strategies, including the suggested in the question strategy to load related data using the "IN" statement.
The "IN" strategy can be easily enough implemented "outside" of the SQLAlchemy (see the code below) and probably it should not be complex to implement it as a new loading strategy (as logically it is similar to existing subqueryload
strategy).
Full version:
I started with a simple experiment to see the queries produced by different strategies
The full source code of the experiment is on Github.
My models look this way:
class Author(ModelBase):
__tablename__ = 'authors'
id = Column(Integer, primary_key=True, nullable=False)
name = Column(String(255))
class Book(ModelBase):
__tablename__ = 'books'
id = Column(Integer, primary_key=True)
name = Column(String)
author_id = Column(Integer, ForeignKey('authors.id'))
author = relationship(
'Author', backref=backref('books'))
Now, the tests, first is lazy loading:
books = session.query(Book).all()
print books[0].author.name
session.commit()
Output (cleaned up):
-------------Lazy--------------
sqlalchemy.engine.base.Engine:
SELECT
books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id
FROM books
SELECT
authors.id AS authors_id, authors.name AS authors_name
FROM authors
WHERE authors.id = ?
INFO:sqlalchemy.engine.base.Engine:(1,)
author1
As expected, lazy loading runs one query to fetch books and one query each time we access an author.
Subquery loading:
books = session.query(Book).options(subqueryload(Book.author)).all()
print books[0].author.name
session.commit()
-------------Subquery----------
SELECT
books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id
FROM books
SELECT
authors.id AS authors_id, authors.name AS authors_name,
anon_1.books_author_id AS anon_1_books_author_id
FROM (
SELECT DISTINCT books.author_id AS books_author_id
FROM books) AS anon_1
JOIN authors
ON authors.id = anon_1.books_author_id
ORDER BY anon_1.books_author_id
author1
For the subquery, we have two queries, first fetches books and another fetches authors using the subquery.
Joined loading:
books = session.query(Book).options(joinedload(Book.author)).all()
print books[0].author.name
session.commit()
-------------Joined------------
SELECT
books.id AS books_id, books.name AS books_name,
books.author_id AS books_author_id,
authors_1.id AS authors_1_id, authors_1.name AS authors_1_name
FROM books
LEFT OUTER JOIN authors AS authors_1 ON authors_1.id = books.author_id
author1
The joined strategy runs just one query to fetch both books and authors.
Immediate loading:
books = session.query(Book).options(immediateload(Book.author)).all()
print books[0].author.name
session.commit()
-------------Immediate---------
SELECT
books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id
FROM books
SELECT
authors.id AS authors_id, authors.name AS authors_name
FROM authors
WHERE authors.id = ?
INFO:sqlalchemy.engine.base.Engine:(1,)
SELECT authors.id AS authors_id, authors.name AS authors_name
FROM authors
WHERE authors.id = ?
INFO:sqlalchemy.engine.base.Engine:(2,)
author1
And the immediate
strategy loads books with the first query and then, when we try to access the relation, fetches all the related data with separate query for each related record.
It looks that "joinedload()" should be most efficient in most cases (amd more efficient than "IN" strategy) - we just get all the data with single query.
Now, lets try to implement the IN strategy outside of SQL alchemy:
print '-------------IN----------------'
books = session.query(Book).all()
ids = set()
for b in books:
ids.add(b.author_id)
authors = session.query(Author).filter(Author.id.in_(ids)).all()
print books[0].author.name
print books[1].author.name
print books[2].author.name
print books[3].author.name
Output:
-------------IN----------------
SELECT
books.id AS books_id, books.name AS books_name, books.author_id AS books_author_id
FROM books
SELECT authors.id AS authors_id, authors.name AS authors_name
FROM authors
WHERE authors.id IN (?, ?)
INFO:sqlalchemy.engine.base.Engine:(1, 2)
author1
author1
author2
author2
As we can see, it runs two queries and then we can access all the authors.
Note that we don't join authors to books explicitly, but it still works when we try to access authors through the books, since SQLAlchemy finds author records in the internal identity map and doesn't run additional DB queries.
The "IN" strategy code similar to above can be generalized into the function which can be used with any model / relation. And probably, the "IN" strategy should be relatively easy to implement as a new SQLAlchemy strategy, it is similar to the existing subqueryloading
- it also should run the second query to get the related data.
Upvotes: 17
Reputation: 480
http://docs.sqlalchemy.org/en/latest/orm/loading_relationships.html#sqlalchemy.orm.selectinload
It was added to sqlalchemy, so now you can just use selectinload
strategy.
Upvotes: 3
Reputation: 354
I made a post to the SQLAlchemy mailing list about this: https://groups.google.com/d/msg/sqlalchemy/8-kHuliJpr8/PHUZLLtMEQAJ
The "in" loading that Boris Serebrov mentioned only seems to work one way by default. It will still run queries (if you’re not doing eager loading) if you access relations from the “one” side of a one to many relationship.
I ended up with this solution: https://gist.github.com/pawl/df5ba8923d9929dd1f4fc4e683eced40
Upvotes: 0
Reputation: 1197
You can choose either to work with the abstract ORM layer, in which case, you model the childern attribute as relation with ORM relationship, something like:
from sqlalchemy.orm import relationship
children = relationship("<name of the acl_links class>", lazy="joined")
Using the lazy="joined"
result in eager loading, as requested (this is equivalent to joinedload suggested already by @vsminkov) from the documentation:
The default loader strategy for any relationship() is configured by the lazy keyword argument ... Below we set it as joined so that the children relationship is eager loaded using a JOIN
There are quite a few tweaks you can apply when defining the relationship, so check out the documentation to get the best out of it.
Or you can choose to work with the Query API and compose a query as you choose, for example, do a simple second where-in query, e.g.:
list_of_ids_previously_loaded_companies = <information from your previous query>
the_session.query(<name of the acl_links class>).filter(<name of the acl_links class>.eid.in_(list_of_ids_previously_loaded_companies)
You go even lower and use expression language, something like:
q = select([acl_links]).where(acl_links.c.eid.in_(list_of_ids_previously_loaded_companies))
the_session.execute(q).fetchall()
As final resort, you can do completely raw sql:
from sqlalchemy import text
children_results = a_db_connection.execute(text(<SQL STATEMENT STRING>).fetchall()
Choose which ever works best for your needs. Note it is still your responsibility to model correctly your schema and place proper index and foreign keys to optimize performance.
Upvotes: 0