Reputation: 1911
I have a giant mysql table which is growing at all the time. It's recording chat data.
this what my table looks like
CREATE TABLE `log` (
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
`channel` VARCHAR(26) NOT NULL,
`timestamp` DATETIME NOT NULL,
`username` VARCHAR(25) NOT NULL,
`message` TEXT NOT NULL,
PRIMARY KEY (`id`),
INDEX `username` (`username`)
)
COLLATE='latin1_swedish_ci'
ENGINE=InnoDB
AUTO_INCREMENT=2582573
;
Indexing the username is kinda important because queries for a username can take like 5 seconds otherwise.
Is there anyway of optimizing this table even more to prepare it for huge amounts of data. So that even 100m rows won't be a problem.
Upvotes: 0
Views: 77
Reputation: 142528
`id` BIGINT(20) NOT NULL AUTO_INCREMENT,
Will you have more than 4 billion rows? If not, use INT UNSIGNED
, saving 4 bytes per row. Plus another 4 bytes for each row in the secondary index.
`channel` VARCHAR(26) NOT NULL,
`username` VARCHAR(25) NOT NULL,
Normalize each -- that is, replace this by, say, a SMALLINT UNSIGNED
and have a mapping between them. Savings: lots.
INDEX `username` (`username`)
That becomes user_id, saving even more.
Smaller --> more cacheable --> faster.
What other queries will you have?
"Memory usage" -- For InnoDB, set innodb_buffer_pool_size
to about 70% of available RAM. Then, let it worry about what is in memory, what it not. Once the table is too big to be cached, you should shrink the data (as I mentioned above) and provide 'good' indexes (as mentioned in other comments) and perhaps structure the table for "locality of reference" (without knowing all the queries, I can't address this).
You grumbled about using IDs instead of strings... Let's take a closer look at that. How many distinct usernames are there? channels? How does the data come in -- do you get one row at a time, or batches? Is something doing direct INSERTs
or feeding to some code that does the INSERTs
? Could there be a STORED PROCEDURE
to do the normalization and insertion? If you need hundreds of rows inserted per second, then I can discuss how to do both, and do them efficiently.
You did not ask about PARTITIONs
. I do not recommend it for a simple username
query.
2.5M rows is about the 85th percentile. 100M rows is more exciting -- 98th percentile.
Upvotes: 1