Jamie Lister
Jamie Lister

Reputation: 63

Mysql BLOB performance

I have a simple Mysql 5.1 (and also tried version 5.6) BLOB table with a string primary key 'filename' and a BLOB field.

CREATE TABLE records (filename VARCHAR(255) PRIMARY KEY, file_content LONGBLOB)

This table has about 1520 records with file_content sizes ranging from about 30KB to 1MB. Here are some surprising query timings:

SELECT filename FROM records -- (31 milli-secs, 1520 rows)
SELECT filename FROM records WHERE filename like '%1-11-1.param' -- (31 milli-secs, 304 rows)
SELECT * FROM records  -- (8.5 seconds, 1520 rows)
SELECT * FROM records WHERE filename = 'file-1-11-1.param' -- (32 milli-secs, 1 row)
SELECT * FROM records WHERE filename like '%1-11-1.param' -- (3.5 seconds, 304 rows)

When the file_content is selected, then the query is slow, except when the WHERE clause has a direct identity. However, if I do a self JOIN instead (mimic of last SELECT)

SELECT a1.filename, a1.file_content FROM records AS a1
INNER JOIN
(SELECT filename FROM records WHERE filename LIKE '%1-11-1.param') AS a2
ON a1.filename = a2.filename
-- (359 milli-secs, 304 rows)

Why is the JOIN so much faster (359 millis < 3.5 seconds) than a simple select * when they produce the same result?

EDIT I've tried the same with SQLServer 2008R2. I can report that this database has no slow down. It is fast for all tests, but does take 9 seconds when all data must be retrieved (like MySQL) since it is returning about 500MB of data.

I've also tried with MySQL Engine=InnoDB, ROW_FORMAT=DYNAMIC. No help. I was hoping this would mean the BLOBs are not loaded into memory for each LIKE test.

Upvotes: 0

Views: 1307

Answers (2)

Jamie Lister
Jamie Lister

Reputation: 63

After doing more research and testings, it appears that for MySQL:

In cases where a field from the row must be tested (e.g. LIKE), then it retrieves every field that appears in the SELECT or WHERE, even if the test fails or could fast-fail. This is a real problem for large BLOB fields, especially BLOBs that don't fit into the row's memory (more than about 65K).

In cases where the test can be determined from an INDEX on the field, then only the matching rows are retrieved.

Upvotes: 1

paul.ago
paul.ago

Reputation: 4103

Probabily all the time spent by the query is just I/O between mysql client and server.

304 rows potentially can result in a 304MB resultset, which is a lot.

I'd rather query the filename field alone if you don't need the whole BLOBs

Upvotes: 0

Related Questions