Reputation: 339
I'm writing an app that uses MySQL to save file hash data to a simple DB with a single table. I's created as follows:
CREATE DATABASE IF NOT EXISTS hash_db;
CREATE TABLE IF NOT EXISTS hash_db.main_tbl
(
sha256 CHAR(64) PRIMARY KEY ,
sha1 CHAR(40) UNIQUE KEY ,
md5 CHAR(32) UNIQUE KEY ,
created DATETIME ,
modified DATETIME ,
size BIGINT ,
ext VARCHAR(260) ,
path TEXT(32768) ,
new_record BOOL
)
ENGINE = MyISAM
CREATE UNIQUE INDEX sha256_idx ON hash_db.main_tbl (sha256)
CREATE UNIQUE INDEX sha1_idx ON hash_db.main_tbl (sha1)
CREATE UNIQUE INDEX md5_idx ON hash_db.main_tbl (md5)
Then I'm doing only simple selects and inserts of the form:
SELECT * FROM hash_db.main_tbl WHERE
sha256 = '...' OR
sha1 = '...' OR
md5 = '...'
INSERT INTO hash_db.main_tbl
(sha256, sha1, md5, created, modified, size, ext, path, new_record) VALUES
(
'...' ,
'...' ,
'...' ,
FROM_UNIXTIME(...) ,
FROM_UNIXTIME(...) ,
... ,
'...' ,
'...' ,
TRUE
)
Data is pretty much random and uniqueness probability is very high (not that it should matter, or should it?). First question, is it normal that InnoDB is much slower (~7x slower) than MyISAM for such usage? I read that it's supposed to be the other way round (tried with 512M innodb_buffer_pool_size, no difference).
Second... I have tested with and without indexes (MyISAM) and the version with indexes is actually slower. These are actual performance data as measured by my app (using performance counters in C):
With indexes:
Selects per second: 393.7
Inserts per second: 1056.1
Without indexes:
Selects per second: 585.3
Inserts per second: 1480.9
The data I'm getting is repeatable. I have tested with enlarged key_buffer_size (32M, default being 8M).
What am I doing wrong or missing?
================================================================================
Edited after Gordon Linoff's suggestion:
I have tried with UNION ALL and I get actually decreased performance, 70 selects per second precisely. The output from EXPLAIN is as follows:
EXPLAIN EXTENDED SELECT * FROM main_hash_db.main_tbl WHERE md5 = '...'
+----+-------------+----------+-------+---------------+------+---------+-------+------+----------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+-------+---------------+------+---------+-------+------+----------+-------+
| 1 | SIMPLE | main_tbl | const | md5 | md5 | 97 | const | 1 | 100.00 | NULL |
+----+-------------+----------+-------+---------------+------+---------+-------+------+----------+-------+
EXPLAIN EXTENDED SELECT * FROM main_hash_db.main_tbl WHERE md5 = '...' UNION ALL SELECT * FROM main_hash_db.main_tbl WHERE sha1 = '...'
+----+--------------+------------+-------+-----------------------+------+---------+-------+------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+------------+-------+-----------------------+------+---------+-------+------+----------+-----------------+
| 1 | PRIMARY | main_tbl | const | md5 | md5 | 97 | const | 1 | 100.00 | NULL |
| 2 | UNION | main_tbl | const | sha1,sha1_idx,md5_idx | sha1 | 121 | const | 1 | 100.00 | NULL |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+------------+-------+-----------------------+------+---------+-------+------+----------+-----------------+
EXPLAIN EXTENDED SELECT * FROM main_hash_db.main_tbl WHERE md5 = '...' UNION ALL SELECT * FROM main_hash_db.main_tbl WHERE sha1 = '...' UNION ALL SELECT * FROM main_hash_db.main_tbl WHERE sha256 = '...'
+----+--------------+--------------+-------+-----------------------+---------+---------+-------+------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+-------+-----------------------+---------+---------+-------+------+----------+-----------------+
| 1 | PRIMARY | main_tbl | const | md5 | md5 | 97 | const | 1 | 100.00 | NULL |
| 2 | UNION | main_tbl | const | sha1,sha1_idx,md5_idx | sha1 | 121 | const | 1 | 100.00 | NULL |
| 3 | UNION | main_tbl | const | PRIMARY,sha256_idx | PRIMARY | 192 | const | 1 | 100.00 | NULL |
| NULL | UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+--------------+-------+-----------------------+---------+---------+-------+------+----------+-----------------+
Which made me see I had an error in the index creation (I was creating two separate indices for the 'sha1' column). But after the fix things are still slow (~70 selects per second), and here's the output for EXPLAIN
:
+----+--------------+--------------+-------+--------------------+---------+---------+-------+------+----------+-----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------+--------------+-------+--------------------+---------+---------+-------+------+----------+-----------------+
| 1 | PRIMARY | main_tbl | const | md5,md5_idx | md5 | 97 | const | 1 | 100.00 | NULL |
| 2 | UNION | main_tbl | const | sha1,sha1_idx | sha1 | 121 | const | 1 | 100.00 | NULL |
| 3 | UNION | main_tbl | const | PRIMARY,sha256_idx | PRIMARY | 192 | const | 1 | 100.00 | NULL |
| NULL | UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | NULL | Using temporary |
+----+--------------+--------------+-------+--------------------+---------+---------+-------+------+----------+-----------------+
================================================================================
Third edit upon further discussion (see below). Here's the EXPLAIN
output for the original query (no additional indexes defined, database is created as described above):
explain extended select path from main_hash_db.main_tbl where sha256 = '...' or md5 = '...' or sha1 = '...' ;
+----+-------------+----------+-------------+------------------+------------------+------------+------+------+----------+--------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+-------------+------------------+------------------+------------+------+------+----------+--------------------------------------------+
| 1 | SIMPLE | main_tbl | index_merge | PRIMARY,sha1,md5 | PRIMARY,md5,sha1 | 192,97,121 | NULL | 3 | 100.00 | Using union(PRIMARY,md5,sha1); Using where |
+----+-------------+----------+-------------+------------------+------------------+------------+------+------+----------+--------------------------------------------+
Performance as measured by my app:
Selects per second: 500.6
Inserts per second: 1394.8
This is the result with 3 selects (issued separately, not with UNION
):
Selects per second: 2525.1
Inserts per second: 1584.3
Upvotes: 0
Views: 526
Reputation: 1
you will decrease performance and slow the flow with your data base because you will create a wide number of Indexes
every time when insert one tuple that s mean you will increase directly the number of Index in your System SGBD
when you do a selection it s like a research .. with a wide number of indexes make some challenge in the System ; like priority and you will have a wide number of priorities you have 1000 tuples in your DB with 3000 indexes
every software have his methode to manage indexes ; and you must know how to master indexes then you can push the System in his maximum potential you can use for example trrigrs with indexes to make a good balance
Upvotes: 0
Reputation: 1269803
First, you would expect insert
without an index to be faster. There is no mystery there. The index does not have to be maintained. In fact, when doing large inserts, a good strategy is often to drop the indexes first, do the inserts, and then rebuild them.
The select
is more troublesome. After all, that is where you want the indexes to be used. Your query is:
SELECT *
FROM hash_db.main_tbl
WHERE sha256 = '...' OR
sha1 = '...' OR
md5 = '...';
This happens to be a worst case for index usage. You would need to look at the explain
to see how the indexes are being used.
My recommendation is to write the query like this:
SELECT *
FROM hash_db.main_tbl
WHERE sha256 = '...'
UNION ALL
SELECT *
FROM hash_db.main_tbl
sha1 = '...'
UNION ALL
SELECT *
FROM hash_db.main_tbl
WHERE md5 = '...';
(Or use union
if you really want to eliminate duplicates.)
This should take advantage of each index for each subquery and should give you the performance you want.
Upvotes: 3