Howard Zoopaloopa
Howard Zoopaloopa

Reputation: 3822

Mysql Date Range Query Loop in Python

I'm loop querying a few date ranges:

con = MySQLdb.connect(host='xxx.com', port=3306,user='user', passwd='pass', db='db')

intervals = 10
precision = 1

dateChunks = list()
for i in range(0,intervals,precision):
    results = load_data_window(i,precision)
    dateChunks.append(results)

def load_data_window(start,precision):
    length = start + precision # <-- time period 
    limit = 20000 
    cur = con.cursor()
    sql = "SELECT data FROM table WHERE date < DATE_SUB(NOW(), INTERVAL %s HOUR) AND date > DATE_SUB(NOW(), INTERVAL %s HOUR)"
    cur.execute(sql,(start,length))
    results =  cur.fetchall()

It works lightning fast the first few loops, sometimes even all of them, but bogs significantly from time to time. There are other actions happening on the database from different places... Is there something I can do to ensure my query has priority? Something like a transaction?

######### EDIT

I did notice, if I move the con = MSQLdb... inside of the load_data_window function, I get really fast results and then bogging, whereas if I keep the con = MSQLdb... outside it is consistently slower...

Upvotes: 1

Views: 664

Answers (1)

O. Jones
O. Jones

Reputation: 108806

Does your date column have an index on it? Such an index may help performance a great deal. That's especially true if your queries progressively slow down: MySQL may be scanning through your table, further and further for each successive hour, looking for the first row to return.

It looks like you're fetching an hour's worth of data with each request, and running fetchall() to slurp all that data into RAM in your python program at once. Is that what you want? If you have an hour with lots of results in it you may hammer the RAM in your python program, forcing garbage collection and even operating-system thrashing.

Can you, instead, cause your python program to iterate over the rows in the result set? This will most likely take a lot less RAM. You can do something like this: http://dev.mysql.com/doc/connector-python/en/connector-python-api-mysqlcursor-fetchone.html

cur.execute(sql,(start,length))
for row in cur:
    print(row)

This sort of strategy would also allow you to process your data efficiently in chunks bigger than an hour if that would help your application. In many, but not all, cases fewer queries means better queries.

Do some of your hours have hugely more rows in them? This could be causing the unpredictable performance.

Finally, you have an off-by-one bug in your query. It should say this...

 WHERE ...  AND date >= DATE_SUB(NOW(), INTERVAL %s HOUR)

(Notice the >= replacing your >)

Upvotes: 1

Related Questions