gempir
gempir

Reputation: 1911

Optimizing a giant mysql table

I have a giant mysql table which is growing at all the time. It's recording chat data.

this what my table looks like

CREATE TABLE `log` (
    `id` BIGINT(20) NOT NULL AUTO_INCREMENT,
    `channel` VARCHAR(26) NOT NULL,
    `timestamp` DATETIME NOT NULL,
    `username` VARCHAR(25) NOT NULL,
    `message` TEXT NOT NULL,
    PRIMARY KEY (`id`),
    INDEX `username` (`username`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=2582573
;

Indexing the username is kinda important because queries for a username can take like 5 seconds otherwise.

Is there anyway of optimizing this table even more to prepare it for huge amounts of data. So that even 100m rows won't be a problem.

Upvotes: 0

Views: 77

Answers (1)

Rick James
Rick James

Reputation: 142528

`id` BIGINT(20) NOT NULL AUTO_INCREMENT,

Will you have more than 4 billion rows? If not, use INT UNSIGNED, saving 4 bytes per row. Plus another 4 bytes for each row in the secondary index.

`channel` VARCHAR(26) NOT NULL,
`username` VARCHAR(25) NOT NULL,

Normalize each -- that is, replace this by, say, a SMALLINT UNSIGNED and have a mapping between them. Savings: lots.

INDEX `username` (`username`)

That becomes user_id, saving even more.

Smaller --> more cacheable --> faster.

What other queries will you have?

"Memory usage" -- For InnoDB, set innodb_buffer_pool_size to about 70% of available RAM. Then, let it worry about what is in memory, what it not. Once the table is too big to be cached, you should shrink the data (as I mentioned above) and provide 'good' indexes (as mentioned in other comments) and perhaps structure the table for "locality of reference" (without knowing all the queries, I can't address this).

You grumbled about using IDs instead of strings... Let's take a closer look at that. How many distinct usernames are there? channels? How does the data come in -- do you get one row at a time, or batches? Is something doing direct INSERTs or feeding to some code that does the INSERTs? Could there be a STORED PROCEDURE to do the normalization and insertion? If you need hundreds of rows inserted per second, then I can discuss how to do both, and do them efficiently.

You did not ask about PARTITIONs. I do not recommend it for a simple username query.

2.5M rows is about the 85th percentile. 100M rows is more exciting -- 98th percentile.

Upvotes: 1

Related Questions