Reputation: 34673
I have the following code:
def executeQuery(conn, query):
cur = conn.cursor()
cur.execute(query)
return cur
def trackTagsGenerator(chunkSize, baseCondition):
""" Returns a dict of trackId:tag limited to chunkSize. """
sql = """
SELECT track_id, tag
FROM tags
WHERE {baseCondition}
""".format(baseCondition=baseCondition)
limit = chunkSize
offset = 0
while True:
trackTags = {}
# fetch the track ids with the coresponding tag
limitPhrase = " LIMIT %d OFFSET %d" % (limit, offset)
query = sql + limitPhrase
offset += limit
cur = executeQuery(smacConn, query)
rows = cur.fetchall()
if not rows:
break
for row in rows:
trackTags[row['track_id']] = row['tag']
yield trackTags
I want to use it like this:
for trackTags in list(trackTagsGenerator(DATA_CHUNK_SIZE, baseCondition)):
print trackTags
break
This code produces the following error without even fetching one chunk of track tags:
Exception _mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now") in <bound method SSDictCursor.__del__ of <MySQLdb.cursors.SSDictCursor object at 0x10b067b90>> ignored
I suspect it's because I have the query execute logic in the body of loop in the generator function.
Is someone able to tell me how to fetch chunks of data using mysqldb in such way?
Upvotes: 2
Views: 3449
Reputation: 91119
You use a SSDictCursor
, something that maps to mysql_use_result()
on MySQL-API-side. This requires that you read out the complete result before you can issue a new command.
As this happens before you receive the first chunk of data after all: are yu sure that this doesn't happen in the context of the query before this part of code is executed? The results of that last query might be still in the line, and executing the next one (i. e., the fist one in this context) might break things...
Upvotes: 1
Reputation: 28036
I'm pretty sure this is because it can run into situations where you've got two queries running simultaniously because of the yield. Depending on how you call the function (threads, async, etc..) I'm pretty sure your cursor might get clobbered too?
As well, you're opening yourself up to (sorry, but I can't sugar coat this part) horrific SQL injection holes by inserting baseConditional using essentially a printf. Take a look at the DB-API’s parameter substitution docs for help.
Yield isn't going to save you time or energy here at all, the full sql command will always need to run before you'll get a single result. (Hence you're using LIMIT and OFFSET to make it more friendly, kudos)
i.e. someone updates the table while you're yielding out some data, in this particular case - not the end of the world. In many others, it gets ugly.
If you're just goofing around and you want this to work 'right-now-dammit', it'd probably work to modify executeQuery as such:
def executeQuery(conn, query):
cur = conn.cursor()
cur.execute(query)
cur = executeQuery(smacConn, query)
rows = cur.fetchall()
cur.close()
return rows
One thing that also kinda jumps out at me - you define trackTags = {}, but then you update tagTrackIds, and yield trackTags.. Which will always be empty dict.
My suggestion would be to not bother yourself with the headache of hand writing SQL if you're just trying to get a hobby project working. Take a look at Elixir which is built on top of SQLAlchemy.
Using an ORM (object-relational-mapper) can be a much more friendly introduction to databases. Defining what your objects look like in Python, and having it automatically generate your schema for you - and being able to add/modify/delete things in a Pythonic manner is really nifty.
If you really need to be async, check out ultramysql python module.
Upvotes: 1