Sav
Sav

Reputation: 9

Mysql: how to handle tables with millions of rows?

I'm looking for solutions to properly archive very large tables (about 10,000 rows per day).

I currently have this situation:

Order Table:

CREATE TABLE `tbl_order` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `idproduct` int(11) NOT NULL DEFAULT '0',
  `iduser` int(11) NOT NULL DEFAULT '0',
  `state` int(11) NOT NULL DEFAULT '0',
  `progressive` int(11) NOT NULL DEFAULT '0',
  `show-voucher` int(11) NOT NULL DEFAULT '0',
  `voucher-custom` int(11) NOT NULL DEFAULT '0', 
  `check-validate` int(11) NOT NULL DEFAULT '0',
  `code-order` varchar(8) NOT NULL DEFAULT '',
  `code-product` char(15) NOT NULL DEFAULT '',
  `product-year` int(11) NOT NULL DEFAULT '2017',
  `product-area` char(3) NOT NULL DEFAULT '',
  `payment-type` char(3) NOT NULL DEFAULT '',
  `usr-qnt` int(11) NOT NULL DEFAULT '0',
  `usr-id` char(11) NOT NULL DEFAULT '',
  `usr-cid` char(8) NOT NULL DEFAULT '',
  `usr-ct` char(3) NOT NULL DEFAULT '000',
  `price` decimal(10,2) NOT NULL DEFAULT '0.00',
  `price-penale` decimal(10,2) NOT NULL DEFAULT '0.00',
  `price-rate` decimal(10,2) NOT NULL DEFAULT '0.00',
  `price-contanti` decimal(10,2) NOT NULL DEFAULT '0.00',
  `price-bonusmalus-rate` decimal(10,2) NOT NULL DEFAULT '0.00',
  `price-bonusmalus-contanti` decimal(10,2) NOT NULL DEFAULT '0.00',
  `price-incasso-contanti` decimal(10,2) NOT NULL DEFAULT '0.00',
  `rate-qnt` int(11) NOT NULL DEFAULT '0',
  `card-qnt` int(11) NOT NULL DEFAULT '0',

  `grp-user` longtext NOT NULL,
  `grp-price` longtext NOT NULL,
  `grp-item` longtext NOT NULL,
  `grp-element` longtext NOT NULL,

  `bonusmalus-description` varchar(500) NOT NULL,

  `note-s` text NOT NULL ,
  `note-c` text NOT NULL,
  `note-incasso` text NOT NULL,
  `note-interne` text NOT NULL,

  `d-start` date NOT NULL DEFAULT '0000-00-00',
  `d-end` date NOT NULL DEFAULT '0000-00-00',
  `d-create` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `d-incasso` date NOT NULL DEFAULT '0000-00-00',
  `d-sconf` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `d-cconf` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `d-export` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `d-expire` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `d-notify-vote` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `d-lastupdate` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

  PRIMARY KEY (`id`),
  KEY `iduser` (`iduser`),
  KEY `code-order` (`code-order`),
  KEY `code-product` (`code-product`),
  KEY `idproduct` (`idproduct`),
  KEY `state` (`state`),
  KEY `price` (`price`),
  KEY `usr-qnt` (`usr-qnt`),
  KEY `d-expire` (`d-expire`),
  KEY `d-export` (`d-export`),
  KEY `price-bonusmalus-contanti` (`price-bonusmalus-contanti`),
  KEY `price-penale` (`price-penale`),
  KEY `price-bonusmalus-contanti_2` (`price-bonusmalus-contanti`),
  KEY `price-rate` (`price-rate`),
  KEY `price-contanti` (`price-contanti`),
  KEY `show-voucher` (`show-voucher`),
  KEY `voucher-custom` (`voucher-custom`),
  KEY `check-validate` (`check-validate`),
  KEY `progressive` (`progressive`),
  KEY `d-incasso` (`d-incasso`),
  KEY `price-incasso-contanti` (`price-incasso-contanti`),
  KEY `d-notify-vote` (`d-notify-vote`),
  KEY `product-year` (`product-year`),
  KEY `product-area` (`product-area`),
  KEY `d-lastupdate` (`d-lastupdate`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;

Orders are the requests that users (iduser) make for tourist packages, such as Booking.com

This table generates about 8,000 to 15,000 lines per day. I'm afraid this table may become too big and cause problems.

My core fields are:

I've read several solutions on the web, but I can not figure out which one might be the best one:

1) Divide the main table into many other sub-tables or many other databases?

Ex.

Or

Or

Or

In this case, are SELECTs to be made through the UNION?

2) Partition the table? Which fields can be performing? Product year? Product area (Total 20)? The date of creation of the order (d-create)? By id?

3) Sharding? But I do not know what it is ...

4) Innodb o MyISAM?

One solution I can still adopt is also to split the LONGTEXT fields into secondary tables to reduce the weight of tbl_order:

Doubt: If I do this I have reduced the weight of the tbl_order table, but I have not reduced the number of records. Therefore the db.tbl_order-grp-user, db.tbl_order-grp-price, db.tbl_order-grp-item, db.tbl_order-grp-elements tables must be partitioned? If you are using range idorder?

The SELECT to have all the data will be:

Select *,
( SELECT `u`.`data` FROM `db`.`tbl_order-grp-user` as `u` where `u`.`idorder`=`order`.`id`) as `grp-user`, 
( SELECT `p`.`data` FROM `db`.`tbl_order-grp-price` as `p` where `p`.`idorder`=`order`.`id`) as `grp-price`, 
( SELECT `i`.`data` FROM `db`.`tbl_order-grp-item` as `i` where `i`.`idorder`=`order`.`id`) as `grp-item`
FROM  `db`.`tbl_order` as `order`
WHERE ............

Thanks for all the support! :-)

Upvotes: 0

Views: 1439

Answers (1)

Rick James
Rick James

Reputation: 142528

Novice alert...

Don't use INT (4 bytes) when MEDIUMINT UNSIGNED (3 bytes) would suffice. Look up the rest of the INT options.

Don't blindly index every column.

Do look at your SELECTs to see which composite indexes would be beneficial. See my Index Cookbook .

Don't worry much about 15K/day -- that is less than 1/sec. 100/sec is the first tipping point toward potential problems.

Don't PARTITION. It is not a panacea, and usually provides no benefits. There are very few use cases.

Don't split into multiple 'identical' tables. Ever. (Well, there are very few valid use cases.)

Don't fear a million rows; do have concern about a billion rows.

Don't use CHAR for variable length fields; use VARCHAR.

Do consider utf8mb4 instead of utf8. utf8mb4 matches the external world's view of UTF-8, and includes Emoji and all of Chinese.

Do use InnoDB. Period. Full stop. MyISAM is going away; InnoDB is as good or better in virtually all respects.

Consider changing the column names to avoid -; _ is common, and avoids errors when you forget the backtics.

Don't Shard. (This is splitting the data across multiple servers.) This is a medium sized table with medium sized traffic; Sharding is need for huge tables with huge traffic.

Do say CHARACTER SET ascii where appropriate. For example product-area. What you have now takes 9 bytes -- 3 characters * room for 3 bytes (utf8) per character.

Consider TINYINT(3) UNSIGNED ZEROFILL for product-area -- This will take 1 byte and reconstruct the leading zeros for you.

Consider whether you have a "natural" PRIMARY KEY instead of AUTO_INCREMENT.

Do tell us what the grp columns contain.

Do come back with tentative SELECT statements. I cannot finish this review without them.

Do consider whether this should be a single table. Is it really the case that one user orders exactly one product? You probably need a table for users, a table for products, a table for orders, etc.

Upvotes: 3

Related Questions