TravisVOX
TravisVOX

Reputation: 21631

Slow MySQL queries in Python but fast elsewhere

I'm having a heckuva time dealing with slow MySQL queries in Python. In one area of my application, "load data infile" goes quick. In an another area, the select queries are VERY slow.

Executing the same query in PhpMyAdmin AND Navicat (as a second test) yields a response ~5x faster than in Python.

A few notes...

My link to the database is fairly standard...

dbconn=MySQLdb.connect(host="127.0.0.1",user="*",passwd="*",db="*", cursorclass = MySQLdb.cursors.SSCursor)

Any insights/help/advice would be greatly appreciated!

UPDATE

In terms of fetching/handling the results, I've tried it a few ways. The initial query is fairly standard...

# Run Query
cursor.execute(query)

I removed all of the code within this loop just to make sure it wasn't the case bottlekneck, and it's not. I put dummy code in its place. The entire process did not speed up at all.

db_results = "test"

# Loop Results
for row in cursor:

    a = 0 (this was the dummy code I put in to test)

return db_results

The query result itself is only 501 rows (large amount of columns)... took 0.029 seconds outside of Python. Taking significantly longer than that within Python.

The project is related to horse racing. The query is done within this function. The query itself is long, however, it runs well outside of Python. I commented out the code within the loop on purpose for testing... also the print(query) in hopes of figuring this out.

# Get PPs
def get_pps(race_ids):

# Comma Race List
race_list = ','.join(map(str, race_ids))

# PPs Query
query = ("SELECT raceindex.race_id, entries.entry_id, entries.prognum, runlines.line_id, runlines.track_code, runlines.race_date, runlines.race_number, runlines.horse_name, runlines.line_date, runlines.line_track, runlines.line_race, runlines.surface, runlines.distance, runlines.starters, runlines.race_grade, runlines.post_position, runlines.c1pos, runlines.c1posn, runlines.c1len, runlines.c2pos, runlines.c2posn, runlines.c2len, runlines.c3pos, runlines.c3posn, runlines.c3len, runlines.c4pos, runlines.c4posn, runlines.c4len, runlines.c5pos, runlines.c5posn, runlines.c5len, runlines.finpos, runlines.finposn, runlines.finlen, runlines.dq, runlines.dh, runlines.dqplace, runlines.beyer, runlines.weight, runlines.comment, runlines.long_comment, runlines.odds, runlines.odds_position, runlines.entries, runlines.track_variant, runlines.speed_rating, runlines.sealed_track, runlines.frac1, runlines.frac2, runlines.frac3, runlines.frac4, runlines.frac5, runlines.frac6, runlines.final_time, charts.raceshape "
         "FROM hrdb_raceindex raceindex "
         "INNER JOIN hrdb_runlines runlines ON runlines.race_date = raceindex.race_date AND runlines.track_code = raceindex.track_code AND runlines.race_number = raceindex.race_number "
         "INNER JOIN hrdb_entries entries ON entries.race_date=runlines.race_date AND entries.track_code=runlines.track_code AND  entries.race_number=runlines.race_number AND entries.horse_name=runlines.horse_name "
         "LEFT JOIN hrdb_charts charts ON runlines.line_date = charts.race_date AND runlines.line_track = charts.track_code AND runlines.line_race = charts.race_number "
         "WHERE raceindex.race_id IN (" + race_list  + ") "
         "ORDER BY runlines.line_date DESC;")

print(query)

# Run Query
cursor.execute(query)

# Query Fields
fields = [i[0] for i in cursor.description]

# PPs List
pps = []

# Loop Results
for row in cursor:

    a = 0
    #this_pp = {}

    #for i, value in enumerate(row):
    #    this_pp[fields[i]] = value            

    #pps.append(this_pp)

return pps

One final note... I haven't considered the ideal way to handle the result. I believe one cursor allows the result to come back as a set of dictionaries. I haven't even made it to that point yet as the query and return itself is so slow.

Upvotes: 4

Views: 8917

Answers (2)

Lord_Chucky
Lord_Chucky

Reputation: 31

I know this is late, however, I have run into similar issues with mysql and python. My solution is to use queries using another language...I use R to make my queries which is blindly fast, do what I can in R and then send the data to python if need be for more general programming, although R has many general purpose libraries as well. Just wanted to post something that may help someone who has a similar problem, and I know this side steps the heart of the problem.

Upvotes: 1

AllInOne
AllInOne

Reputation: 1450

Tho you have only 501 rows it looks like you have over 50 columns. How much total data is being passed from MySQL to Python?

501 rows x 55 columns = 27,555 cells returned.

If each cell averaged "only" 1K that would be close to 27MB of data returned.

To get a sense of how much data mysql is pushing you can add this to your query:

SHOW SESSION STATUS LIKE "bytes_sent"

Is your server well-resourced? Is memory allocation well configured?

My guess is that when you are using PHPMyAdmin you are getting paginated results. This masks the issue of MySQL returning more data than your server can handle (I don't use Navicat, not sure about how that returns results).

Perhaps the Python process is memory-constrained and when faced with this large result set it has to out page out to disk to handle the result set.

If you reduce the number of columns called and/or constrain to, say LIMIT 10 on your query do you get improved speed?

Can you see if the server running Python is paging to disk when this query is called? Can you see what memory is allocated to Python, how much is used during the process and how that allocation and usage compares to those same values in the PHP version?

Can you allocate more memory to your constrained resource?

Can you reduce the number of columns or rows that are called through pagination or asynchronous loading?

Upvotes: 2

Related Questions