Python fetch MySQLdb results in chunks with generator - error

Question

I have the following code:

def executeQuery(conn, query):
    cur = conn.cursor()
    cur.execute(query)
    return cur

def trackTagsGenerator(chunkSize, baseCondition):
    """ Returns a dict of trackId:tag limited to chunkSize. """

    sql = """
        SELECT track_id, tag
        FROM tags 
        WHERE {baseCondition}
    """.format(baseCondition=baseCondition)

    limit = chunkSize
    offset = 0
    while True:
        trackTags = {}

        # fetch the track ids with the coresponding tag
        limitPhrase = " LIMIT %d OFFSET %d" % (limit, offset)
        query = sql + limitPhrase
        offset += limit
        cur = executeQuery(smacConn, query)
        rows = cur.fetchall()

        if not rows:
            break

        for row in rows:
            trackTags[row['track_id']] = row['tag']

        yield trackTags

I want to use it like this:

for trackTags in list(trackTagsGenerator(DATA_CHUNK_SIZE, baseCondition)):        
    print trackTags
    break

This code produces the following error without even fetching one chunk of track tags:

Exception _mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now") in > ignored

I suspect it's because I have the query execute logic in the body of loop in the generator function.

Is someone able to tell me how to fetch chunks of data using mysqldb in such way?

synthesizerpatel · Accepted Answer

I'm pretty sure this is because it can run into situations where you've got two queries running simultaniously because of the yield. Depending on how you call the function (threads, async, etc..) I'm pretty sure your cursor might get clobbered too?

As well, you're opening yourself up to (sorry, but I can't sugar coat this part) horrific SQL injection holes by inserting baseConditional using essentially a printf. Take a look at the DB-API’s parameter substitution docs for help.

Yield isn't going to save you time or energy here at all, the full sql command will always need to run before you'll get a single result. (Hence you're using LIMIT and OFFSET to make it more friendly, kudos)

i.e. someone updates the table while you're yielding out some data, in this particular case - not the end of the world. In many others, it gets ugly.

If you're just goofing around and you want this to work 'right-now-dammit', it'd probably work to modify executeQuery as such:

 def executeQuery(conn, query):
     cur = conn.cursor()
     cur.execute(query)
     cur = executeQuery(smacConn, query)
     rows = cur.fetchall()
     cur.close()
     return rows

One thing that also kinda jumps out at me - you define trackTags = {}, but then you update tagTrackIds, and yield trackTags.. Which will always be empty dict.

My suggestion would be to not bother yourself with the headache of hand writing SQL if you're just trying to get a hobby project working. Take a look at Elixir which is built on top of SQLAlchemy.

Using an ORM (object-relational-mapper) can be a much more friendly introduction to databases. Defining what your objects look like in Python, and having it automatically generate your schema for you - and being able to add/modify/delete things in a Pythonic manner is really nifty.

If you really need to be async, check out ultramysql python module.

Python fetch MySQLdb results in chunks with generator - error

Answers (2)

Related Questions