Reputation: 495
I have a sqlalchemy query function like this
def foo():
local_session = Session()
results = local_session.query(T.x, T.y, T.z, T.a, T.b, T.c
, T.d, T.e, T.f, T.g, T.h, T.i, T.j, T.k, T.l
, T.m, T.n, T.o, T.p, T.q, T.r, T.s, T.t, T.u
, T.v,
User.gender).join(User)\
.filter(T.language == 'en', T.where_i_am_from == 'US',
User.some_num >= 0.9).limit(1000000)
local_session.close()
return results, results.count()
The query works fine. and then I call this function here:
def fubar():
raw_data,raw_data_length = myModule.foo()
df = pd.DataFrame()
for each in raw_data:
df = df.append(pd.DataFrame({ #add each.x etc to df..... }}
return df
The issue is that it wont iterate over the "for each in raw_data" loop when I have a .limit on my foo query above 5000, or use .all() or have no limit. The program will just hang and do nothing (0 cpu usage). I've tested this both on my local sql server and my amazon one. When I run the sql directly on the database I return around 800,000 rows. Why is this happening?
I'm using the latest mysql and latest sqlalchemy.
Upvotes: 3
Views: 4127
Reputation: 6758
This may be like MySQL driver problem. I would do the following in order:
-v
flag, like python -v yourprogram.py
. This has a potential of showing you where the program got stuck.
That's relatively cheap to do, all you have to do afterwards is change SQA database string. Obviously, this would show you whether the problem lies with the driver or it's in your code.
T
, User
) - do eager load instead of default lazy load.If you have 800,000 rows and doing a lazy join, that may be a problem. Add a joinedload
(eagerload
in earlier versions of SQLAlchemy) to options
.
Upvotes: 1