Reputation: 599
I have few MySQL tables-these have around 300 columns and 100 million rows. These store data for log files, hence the size. I am using InnoDB engine. Few queries involving joins of these tables obviously do not work. I tried adding indices to these, but the queries do not finish at all.
I wanted to know if there is any other way to speed up performance, or some way to make the 'create index' work on the tables?
Thank you.
Upvotes: 8
Views: 16927
Reputation: 1772
On MySQL tables using MyISAM engine there are some problems creating a new secondary index.
A known issue with MyISAM engine, on some MySQL versions like 5.7.24 (shipped with Wamp for instance) not only causes a table scan, as expected, but need a full table rebuild when you create an index. If you just drop an index, the table is also rebuilt :-(
Ref: https://bugs.mysql.com/bug.php?id=93530
Sometimes you cannot upgrade MySQL or can't ask for customers to do that, to run your solution. Change engine to InnoDB can lead to another problems if you don't need all features InnoDB provides.
So, there is an approach that consists of creating an "index table" manually, with the benefit you can filter records that you really need, as I explain below:
Imagine you have a 100M records of companies of the world on a table, where about 30M are companies of USA and 10M from Canada, plus other companies.
Each company has a COUNTRY and a STATE field, you want to index, because you need to search USA or CANADA companies by it's state.
So, in MySQL if you create an index for Country and State, all 100M records will be indexed, even with NULL states.
To solve this you create an index-table and a real index, like this:
create table index_tb_companies (
company_id int unique,
company_country char(2), -- US/CA
company_state char(2) -- AL/AK/.../WI/WY
);
create index index_tb_companies_index
on index_tb_companies (company_country, company_state);
Now you can import original data to the index-table, with a simple insert into
or replace into
with a filtered select
.
replace into index_tb_companies(
company_id, company_country, company_state)
(select
company_id, company_country, company_state
from original_company_table
where country in ('US', 'CA')
);
This will take a while, since maybe you don't have a index for country yet, and need a full table scan. But the final index-table size will be lower then a MySQL index size, since only US/CA data will be in there.
Now, the final part is to make use of the index-table with your specific report of US and CA companies, since other countries are not covered by the index.
select o.*
from
original_company_table o INNER JOIN
index_tb_companies idx ON idx.company_id = o.company_id
where
idx.company_country = 'US'
and idx.company_state = 'NY'
This approach is particularly good when you want to index a tiny portion of your data on MySQL, so the index size is small.
Other databases, like PostgreSQL, have a "Partial Indexes", you can create regular indexes and pass a where
clause on it's creation.
PG Partial Indexes: https://www.postgresql.org/docs/8.0/indexes-partial.html
Upvotes: 0
Reputation: 562498
Creating an index takes time, proportional to the number of rows in the table. 100 million rows is quite a lot for a MySQL table. It will probably take many hours to create an index on that table. Exactly how long varies, based on other factors including your server hardware, the data type of the columns you are creating the index for, other current load on the database, etc.
One tool that can help you is pt-online-schema-change. It actually takes longer to build the index, but you can continue to read and write the original table while it's working. Test with a smaller table so you get some experience with using this tool.
You can view a webinar about this tool here: Zero-Downtime Schema Changes in MySQL (free to view, but requires registration).
Another technique is to create an empty table like your original, create the index in that table, and then start copying data from your original table into the new table gradually. If this is a log table, it's likely that you write to the table more than you read from the table, so you can probably swap the tables immediately and start logging new events immediately, and backfill it over time.
A tool like pt-archiver can help you to copy data gradually without putting too much load on the server. Simply doing INSERT INTO... SELECT
is not good for your database server's health if you try to copy 100 million rows in one transaction. It also puts locks on the original table. pt-archiver works by copying just a bite-sized chunk of rows at a time, so it avoids the high cost of such a large transaction.
If you use an auto-increment primary key, take care to adjust the value to be higher than the max value in the original table before you let log events start writing to it, so you don't accidentally id values more than once.
Upvotes: 20
Reputation: 8659
Use
create table newtable like oldtable;
Then apply the index to the newtable while it is empty.
Then
insert into newtable select * from oldtable;
This may also take a long time to finish.
Upvotes: 9