marcv
marcv

Reputation: 1976

Performance of joins on multi-million-row tables

I need to give my website users the ability to select their country, province and city. So I want to display a list of countries, then a list of provinces in the selected country, then a list of cities in the selected province (I don't want any other UI solution for now). Of course, every name must be in the user's language, so I need additional tables for the translations.

Let's focus on the case of the cities. Here are the two tables:

CREATE TABLE `city` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `province_id` int(10) unsigned DEFAULT NULL
  PRIMARY KEY (`id`),
  KEY `idx_fk_city_province` (`province_id`),
  CONSTRAINT `fk_city_province` FOREIGN KEY (`province_id`) REFERENCES `province` (`id`)
) ENGINE=InnoDB;

CREATE TABLE `city_translation` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `city_id` int(10) unsigned NOT NULL,
  `locale_id` int(10) unsigned DEFAULT NULL,
  `name` varchar(255) DEFAULT NULL
  PRIMARY KEY (`id`),
  KEY `idx_fk_city_translation_city` (`city_id`),
  KEY `idx_fk_city_translation_locale` (`locale_id`),
  KEY `idx_city_translation_city_locale` (`city_id`,`locale_id`),
  CONSTRAINT `fk_city_translation_city` FOREIGN KEY (`city_id`) REFERENCES `city` (`id`),
  CONSTRAINT `fk_city_translation_locale` FOREIGN KEY (`locale_id`) REFERENCES `locale` (`id`)
) ENGINE=InnoDB;

The city table contains 4 millions rows and the city_translation table 4 millions × the number of the languages available on my website. This is 12 millions now. If in the future I want to support 10 languages, it will be 40 millions...

So I am wondering: is it a bad idea (performance wise) to work with a table of this size, or is a good index (here on the join fields, city_id and locale_id) sufficient to make the size not matter?

If not, what are the common solutions used to solve this specific --but I guess common-- problem? I'm only interested in performance. I'm ok to denormalize if necessary, or even to use other tools if they are more appropriate (ElasticSearch?).

Upvotes: 3

Views: 1318

Answers (1)

Rick James
Rick James

Reputation: 142306

Get rid of id in city_translations. Instead have PRIMARY KEY(city_id, locale_id). With InnoDB, this may double the speed because of cutting out an unnecessary step in the JOINs. And you can shrink the disk footprint by also removing the two indexes starting with city_id.

Do you think you will go beyond 16M cities? I doubt it. So save one byte by changing (in all tables) city_id to MEDIUMINT UNSIGNED.

Save 3 bytes by changing locale_id to TINYINT UNSIGNED.

Those savings are multiplied by the number of columns and indexes mentioning them.

How big are the tables (GB)? What is the setting of innodb_buffer_pool_size? How much RAM is there? See if you can make that setting bigger than the total table size and yet no more than 70% of available memory. (That's the only "tunable" that is worth checking.)

I hope you have a default of CHARACTER SET utf8mb4 for the sake of Chinese users. (But that is another story.)

Upvotes: 4

Related Questions