Reputation: 6102
I want to test high-intensive write between InnoDB and MyRock engine of the MySQL database. For this purpose, I use sysbench to benchmark. My requirements are:
I check all pre-made tests of sysbench and I don't see any tests that satisfy my requirements.
oltp_write_only
: supports multiple threads that write to the same table. But this test doesn't have bulk insert option.bulk_insert
: support multiple threads, but each thread writes to a different table.Are there any pre-made sysbench tests satisfied my requirement? If not, can I find custom Lua scripts somewhere which already are done this?
(from Comment:)
CREATE TABLE IF NOT EXISTS `tableA` (
`id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT,
`user_id` VARCHAR(63) NOT NULL DEFAULT '',
`data` JSON NOT NULL DEFAULT '{}',
PRIMARY KEY (`id`),
UNIQUE INDEX `user_id_UNIQUE` (`user_id` ASC)
) ENGINE = InnoDB;
Upvotes: 2
Views: 1561
Reputation: 142296
(From a MySQL point of view...)
id
and the PK -- saves 8 bytes per row.UNIQUE(user_id)
to PRIMARY KEY(user_id)
-- might save 40 bytes per row (depends on LENGTH(user_id)
).Doing those will
Run OS monitoring tools to see what percentage of the I/O is being consumed. That is likely to be the limiting factor.
Benchmarking products are handy for limited situations. For your situation (and many others), it is best to build your product and time it.
Another thought...
What does the JSON look like? If the JSON has a simple structure (a consistent set of key:value pairs), then the disk footprint might be half as much (hence speed doubling) if you made individual columns. The processing to change from JSON to individual columns would be done in the client, which may (or may not) cancel out the savings I predict.
If the JSON is more complex, there still might be savings by pulling out "columns" that are always present.
If the JSON is "big", then compress it in the client, then write to a BLOB
. This may shrink the disk footprint and network bandwidth by a factor of 3.
You mentioned 250GB for 250M rows? That's 1000 bytes/row. That means the JSON averages 700 bytes? (Note: there is overhead.) Compressing the JSON column into a BLOB
would shrink to maybe 400 bytes/row total, hence only 100GB for 250M rows.
{"b": 100}
takes about 10 bytes. If b could be stored in a 2-byte SMALLINT
column, that would shrink the record considerably.
Another thing: If you promote user_id
to PK, then this is worth considering: Use a file sort to sort the table by user_id before loading it. This is probably faster than INSERTing
the rows 'randomly'. (If the data is already sorted, then this extra sort would be wasted.)
Upvotes: 0