Shay.R
Shay.R

Reputation: 74

Simple count id in MySql table is taking to long

I have to tables with 65.5 Million rows: 1)

CREATE TABLE RawData1 (
  cdasite varchar(45) COLLATE utf8_unicode_ci NOT NULL,
  id int(20) NOT NULL DEFAULT '0',
  timedate datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  type int(11) NOT NULL DEFAULT '0',
  status int(11) NOT NULL DEFAULT '0',
  branch_id int(20) DEFAULT NULL,
  branch_idString varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (id,cdasite,timedate),
  KEY idx_timedate (timedate,cdasite)
) ENGINE=InnoDB;

2) Same table with partition (call it RawData2)

PARTITION BY RANGE ( TO_DAYS(timedate))
(PARTITION p20140101 VALUES LESS THAN (735599) ENGINE = InnoDB,
 PARTITION p20140401 VALUES LESS THAN (735689) ENGINE = InnoDB,
 .
 .
 PARTITION p20201001 VALUES LESS THAN (738064) ENGINE = InnoDB,
 PARTITION future VALUES LESS THAN MAXVALUE ENGINE = InnoDB);

I'm using the same query:

SELECT count(id) FROM RawData1
where timedate BETWEEN DATE_FORMAT(date_sub(now(),INTERVAL 2 YEAR),'%Y-%m-01') AND now();

2 problems: 1. why the partitioned table runs longer then the regular table? 2. the regular table returns 36380217 in 17.094 Sec. is it normal, all R&D leaders think it is not fast enough, it need to return in ~2 Sec.

What do I need to check / do / change ? Is it realistic to scan 35732495 rows and retrieve 36380217 in less then 3-4 sec?

Upvotes: 0

Views: 86

Answers (1)

Rick James
Rick James

Reputation: 142218

  • You have found one example of why PARTITIONing is not a performance panacea.
  • Where does id come from?
  • How many different values are there for cdasite? If thousands, not millions, build a table mapping cdasite <=> id and switch from a bulky VARCHAR(45) to a MEDIUMINT UNSIGNED (or whatever is appropriate). This item may help the most, but perhaps not enough.
  • Ditto for status, but probably using TINYINT UNSIGNED. Or think about ENUM. Either is 1 byte, not 4.
  • The (20) on INT(20) means nothing. You get a 4-byte integer with a limit of about 2 billion.
  • Are you sure there are no duplicate timedates?
  • branch_id and branch_idString -- this smells like a pair that needs to be in another table, leaving only the id here?
  • Smaller -> faster.
  • COUNT(*) is the same as COUNT(id) since id is NOT NULL.
  • Do not include future partitions before they are needed; it slows things down. (And don't use partitioning at all.)

To get that query even faster, build and maintain a Summary Table. It would have at least a DATE in the PRIMARY KEY and at least COUNT(*) as a column. Then the query would fetch from that table. More on Summary tables: http://mysql.rjweb.org/doc.php/summarytables

Upvotes: 1

Related Questions