Iterating over query results from sqlalchemy

Question

I have a sqlalchemy query function like this

def foo():
local_session = Session()
results = local_session.query(T.x, T.y, T.z, T.a, T.b, T.c
, T.d, T.e, T.f, T.g, T.h, T.i, T.j, T.k, T.l
, T.m, T.n, T.o, T.p, T.q, T.r, T.s, T.t, T.u
, T.v, 
User.gender).join(User)\
.filter(T.language == 'en', T.where_i_am_from == 'US',
User.some_num >= 0.9).limit(1000000)

local_session.close()
return results, results.count()

The query works fine. and then I call this function here:

def fubar():

raw_data,raw_data_length = myModule.foo()

df = pd.DataFrame()
for each in raw_data:

    df = df.append(pd.DataFrame({ #add each.x etc to df..... }}


return df

The issue is that it wont iterate over the "for each in raw_data" loop when I have a .limit on my foo query above 5000, or use .all() or have no limit. The program will just hang and do nothing (0 cpu usage). I've tested this both on my local sql server and my amazon one. When I run the sql directly on the database I return around 800,000 rows. Why is this happening?

I'm using the latest mysql and latest sqlalchemy.

LetMeSOThat4U · Accepted Answer

This may be like MySQL driver problem. I would do the following in order:

Run python with -v flag, like python -v yourprogram.py.

This has a potential of showing you where the program got stuck.

Get those 800,000 results and stick them in SQLite with tables in equivalent schema.

That's relatively cheap to do, all you have to do afterwards is change SQA database string. Obviously, this would show you whether the problem lies with the driver or it's in your code.

You're doing a join between two classes (T, User) - do eager load instead of default lazy load.

If you have 800,000 rows and doing a lazy join, that may be a problem. Add a joinedload (eagerload in earlier versions of SQLAlchemy) to options.

Iterating over query results from sqlalchemy

Answers (1)

Related Questions