Improve page speed when querying database

Question

I am using a Flask for one application I am working on, together with sqlalchemy and MySQL database for storing data. Jinja2 is used for templating. While developing, everything was OK, but now when the larger dataset is used, I am getting very slow rendering of some of the webpages that contain a lot of data. For example, a list of users is handled by this code:

def users():
    q = Users.query.all()
    out = []
    for i in q: 
        try:
            something_i_need = Table.query.filter_by(email=i.email).order_by(Table.date).first().id
        except:
            something_i_need = 0
        out.append({
                'id': i.id,
                'last_name': i.last_name,
                'first_name': i.first_name,
                'middle_name': i.middle_name,
                'phone': i.phone,
                'team': i.team,
                'status': i.status,
                'needed': something_i_need,
                'some_more_data': i.some_more_data
            })
    return render_template('users.html', list_of_users=out)

The data is a bit simplified, since theres more fields in the list. The call to this location, which queries around 2000 users at a time takes more than 10 seconds, and then the load of the page takes 10-20 as well. The table is wrapped by the following theme functions, first table from this link, and although disabling sorting functions on some columns helped, its still very slow.

So, I am wondering, how can I optimize this process or the generation of the template in order to make this faster? I am running this on an Amazon EC2 instance, with Python 2.7.3 and mod_wsgi, Flask 0.9, Flask-SQLAlchemy=0.16, Jinja2==2.7, MySQL-python=1.2.4, SQLAlchemy=0.8.1. The one "obvious" solution, making a pagination and returning 100 or so records at a time would really work since the list has to contain all of the users for sorting purposes (date mostly), so that solution would the last resort.

Thanks in advance!

Sean Vieira · Accepted Answer

Your issue is that you are running N+1 queries (where N is the total number of users in the User table) where 1 would do far better:

User.query \
        .outerjoin(Table, User.email == Table.email) \
        .order_by(User.email, Table.date)

will get you all users with all of their entries in Table ordered by each user's email address and then by the date column in Table. We use an outerjoin rather than a join to ensure that we get ever user, even if they don't have an entry in Table.

Now, this is not quite what we need - we really want to get the latest entry in Table if one exists. There is probably a better way to do this, but here's a simple example with a subquery:

last_entry = Table.query \
                  .group_by(Table.email) \
                  .order_by(Table.date) \
                  .subquery()

# Assuming that db is your Flask-SQLAlchemy extension
results = db.session.query(User, last_entry.c.id)  \
    .outerjoin(last_entry, User.email == last_entry.c.email)

Then you can just materialize the query with a call to all and you'll be off to the races:

return render_template('users.html', list_of_users=results.all())

Improve page speed when querying database

Answers (2)

Related Questions