Trần Kim Dự
Trần Kim Dự

Reputation: 6102

Benchmark MySQL with batch insert on multiple threads within same table

I want to test high-intensive write between InnoDB and MyRock engine of the MySQL database. For this purpose, I use sysbench to benchmark. My requirements are:

I check all pre-made tests of sysbench and I don't see any tests that satisfy my requirements.

Are there any pre-made sysbench tests satisfied my requirement? If not, can I find custom Lua scripts somewhere which already are done this?

(from Comment:)

CREATE TABLE IF NOT EXISTS `tableA` (
    `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT, 
    `user_id` VARCHAR(63) NOT NULL DEFAULT '', 
    `data` JSON NOT NULL DEFAULT '{}', 
    PRIMARY KEY (`id`), 
    UNIQUE INDEX `user_id_UNIQUE` (`user_id` ASC)
) ENGINE = InnoDB;

Upvotes: 2

Views: 1561

Answers (1)

Rick James
Rick James

Reputation: 142296

(From a MySQL point of view...)

  • Toss id and the PK -- saves 8 bytes per row.
  • Promote UNIQUE(user_id) to PRIMARY KEY(user_id) -- might save 40 bytes per row (depends on LENGTH(user_id)).

Doing those will

  • Shrink the disk I/O needed (providing some speedup)
  • Eliminate one of the indexes (probably a significant part of the post-load processing)

Run OS monitoring tools to see what percentage of the I/O is being consumed. That is likely to be the limiting factor.

Benchmarking products are handy for limited situations. For your situation (and many others), it is best to build your product and time it.

Another thought...

What does the JSON look like? If the JSON has a simple structure (a consistent set of key:value pairs), then the disk footprint might be half as much (hence speed doubling) if you made individual columns. The processing to change from JSON to individual columns would be done in the client, which may (or may not) cancel out the savings I predict.

If the JSON is more complex, there still might be savings by pulling out "columns" that are always present.

If the JSON is "big", then compress it in the client, then write to a BLOB. This may shrink the disk footprint and network bandwidth by a factor of 3.

You mentioned 250GB for 250M rows? That's 1000 bytes/row. That means the JSON averages 700 bytes? (Note: there is overhead.) Compressing the JSON column into a BLOB would shrink to maybe 400 bytes/row total, hence only 100GB for 250M rows.

{"b": 100} takes about 10 bytes. If b could be stored in a 2-byte SMALLINT column, that would shrink the record considerably.

Another thing: If you promote user_id to PK, then this is worth considering: Use a file sort to sort the table by user_id before loading it. This is probably faster than INSERTing the rows 'randomly'. (If the data is already sorted, then this extra sort would be wasted.)

Upvotes: 0

Related Questions