iwantoski
iwantoski

Reputation: 51

~150ms on a 2 million rows MySQL MyISAM table

I'm learning about MySQL performance with a pet project consisting of ~2million rows + ~600k rows (two MyISAM tables). A range query using BETWEEN on two INT(10) indexed columns, LIMITed to 1 returned result takes about 160ms (including an INNER JOIN). I figure my configuration isn't optimised and am looking for some advice on how to either diagnose, or perhaps "common configuration".

I created a gist containing both tables, the query and the contents of my.cnf.

I created the b-tree index after inserting all data which was imported from a CSV file from MaxMinds open database. I tried two separate, and now a combined index with no difference in performance.

I'm running this locally on a Macbook Pro clocking at 2,6GHz (i5) and 8GB 1600MHz RAM. MySQL is installed using the downloadable binary from mysql's download page (unable to supply a third link because my rep is to low). It's a default installation with no major additions to the my.cnf config-file, included in the gist (located under /usr/local/mysql-5.6.xxx/ directory on my system).

My concern is that I'm reaching ~160ms which indicates to me that I'm missing something. I've considered compressing the table but I have a feeling that I'm missing other configurations. Also the myisampack wasn't in my PATH (I think) so I'm considering other optimisations before I explore this further.

Any advice is appreciated!

$ mysql --version
/usr/local/mysql-5.6.23-osx10.8-x86_64/bin/mysql  Ver 14.14 Distrib 5.6.23, for osx10.8 (x86_64) using  EditLine wrapper

Tables

CREATE TABLE `blocks` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `begin_range` int(10) unsigned NOT NULL,
  `end_range` int(10) unsigned NOT NULL,
  `_location_id` int(11) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `begin_range` (`begin_range`,`end_range`)
) ENGINE=MyISAM AUTO_INCREMENT=2008839 DEFAULT CHARSET=ascii;

CREATE TABLE `locations` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `country` varchar(2) NOT NULL DEFAULT '',
  `region` varchar(255) DEFAULT NULL,
  `city` varchar(255) DEFAULT NULL,
  `postalcode` varchar(255) DEFAULT NULL,
  `latitude` float NOT NULL,
  `longitude` float NOT NULL,
  `metro_code` int(11) DEFAULT NULL,
  `area_code` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=641607 DEFAULT CHARSET=utf8;

Query

SELECT locations.latitude, locations.longitude
FROM blocks
INNER JOIN locations ON blocks._location_id = locations.id
WHERE INET_ATON('139.130.4.5') BETWEEN begin_range AND end_range
LIMIT 0, 1;

Edit; Updated gist with EXPLAIN on the SELECT, also posted here for convenience.

EXPLAIN SELECT locations.latitude, locations.longitude FROM blocks INNER JOIN locations ON blocks._location_id = locations.id WHERE INET_ATON('94.137.106.123') BETWEEN begin_range AND end_range LIMIT 0, 1;

+----+-------------+-----------+--------+---------------+-------------+---------+---------------------------+---------+------------------------------------+
| id | select_type | table     | type   | possible_keys | key         | key_len | ref                       | rows    | Extra                              |
+----+-------------+-----------+--------+---------------+-------------+---------+---------------------------+---------+------------------------------------+
|  1 | SIMPLE      | blocks    | range  | begin_range   | begin_range | 4       | NULL                      | 1095345 | Using index condition; Using where |
|  1 | SIMPLE      | locations | eq_ref | PRIMARY       | PRIMARY     | 4       | geoip.blocks._location_id |       1 | NULL                               |
+----+-------------+-----------+--------+---------------+-------------+---------+---------------------------+---------+------------------------------------+
2 rows in set (0.00 sec)

Edit 2; Included data into the question for convenience.

Upvotes: 1

Views: 583

Answers (1)

Rick James
Rick James

Reputation: 142278

The problem, and the normal approach (which your code exemplifies) leads to hitting 1095345 rows. I have an approach that can do that query in one disk hit, even the cache is cold.

Excerpts from http://mysql.rjweb.org/doc.php/ipranges :

The Situation

Your data includes a large set of non-overlapping 'ranges'. These could be IP addresses, datetimes (show times for a single station), zipcodes, etc.

You have pairs of start and end values; one 'item' belongs to each such 'range'. So, instinctively, you create a table with start and end of the range, plus info about the item. Your queries involve a WHERE clause that compares for being between the start and end values.

The Problem

Once you get a large set of items, performance degrades. You play with the indexes, but find nothing that works well. The indexes fail to lead to optimal functioning because the database does not understand that the ranges are non-overlapping.

The Solution

I will present a solution that enforces the fact that items cannot have overlapping ranges. The solution builds a table to take advantage of that, then uses Stored Routines to get around the clumsiness imposed by it.

Upvotes: 1

Related Questions