Vorpal Swordsman
Vorpal Swordsman

Reputation: 375

python script hangs when calling cursor.fetchall() with large data set

I have a query that returns over 125K rows.

The goal is to write a script the iterates through the rows, and for each, populate a second table with data processed from the result of the query.

To develop the script, I created a duplicate database with a small subset of the data (4126 rows)

On the small database, the following code works:

import os
import sys
import random

import mysql.connector

cnx = mysql.connector.connect(user='dbuser', password='thePassword',
                          host='127.0.0.1',
                          database='db')
cnx_out = mysql.connector.connect(user='dbuser', password='thePassword',
                          host='127.0.0.1',
                          database='db')

ins_curs = cnx_out.cursor()

curs = cnx.cursor(dictionary=True)
#curs = cnx.cursor(dictionary=True,buffered=True) #fail

with open('sql\\getRawData.sql') as fh:
    sql = fh.read()

curs.execute(sql, params=None, multi=False)
result = curs.fetchall()  #<=== script stops at this point
print len(result) #<=== this line never executes

print curs.column_names

curs.close()
cnx.close()
cnx_out.close()
sys.exit()

The line curs.execute(sql, params=None, multi=False) succeeds on both the large and small databases. If I use curs.fetchone() in a loop, I can read all records.

If I alter the line:

curs = cnx.cursor(dictionary=True)

to read:

curs = cnx.cursor(dictionary=True,buffered=True)

The script hangs at curs.execute(sql, params=None, multi=False).

I can find no documentation on any limits to fetchall(), nor can I find any way to increase the buffer size, and no way to tell how large a buffer I even need.

There are no exceptions raised.

How can I resolve this?

Upvotes: 6

Views: 3353

Answers (1)

mactyr
mactyr

Reputation: 563

I was having this same issue, first on a query that returned ~70k rows and then on one that only returned around 2k rows (and for me RAM was also not the limiting factor). I switched from using mysql.connector (i.e. the mysql-connector-python package) to MySQLdb (i.e. the mysql-python package) and then was able to fetchall() on large queries with no problem. Both packages seem to follow the python DB API, so for me MySQLdb was a drop-in replacement for mysql.connector, with no code changes necessary beyond the line that sets up the connection. YMMV if you're leveraging something specific about mysql.connector.

Pragmatically speaking, if you don't have a specific reason to be using mysql.connector the solution to this is just to switch to a package that works better!

Upvotes: 1

Related Questions