Bouncner
Bouncner

Reputation: 2379

MySQL bulk inserts with LOAD INFILE - MyISAM merely slower than MEMORY engine

We are currently performing several performance tests on MySQL to compare it to an approach we are developing for a database prototype. To say it short: database is empty, given a huge csv file, load the data into memory as fast as possible.

We are testing on a 12-core Westmere server with 48 GB RAM, so memory consumption is right now not a real issue.

The problem is the following. We haven chosen MySQL (widely spread, open source) for comparison. Since our prototype is an in-memory database, we have chosen the memory engine in MySQL.

We insert this way (file are up to 26 GB large):

drop table if exists a.a;
SET @@max_heap_table_size=40000000000;
create table a.a(col_1 int, col_2 int, col_3 int) ENGINE=MEMORY;
LOAD DATA CONCURRENT INFILE "/tmp/input_files/input.csv" INTO TABLE a.a FIELDS TERMINATED BY ";";

Performing this load on a 2.6 GB file takes about 80 s, which is four times slower that an (wc -l). Using MyISAM is only 4 seconds slower, even though is writing to disk.

What I am doing wrong here? I suppose that a data write using the memory engine must be by far faster than using MyISAM. And I don't understand why wc -l (both single threaded, but writing to mem is not that slow) is that much faster.

PS: changing read_buffer_size or any other vars I found googling, did not result in significant improvements.

Upvotes: 2

Views: 1472

Answers (2)

Cargo23
Cargo23

Reputation: 3189

I think the reason you didn't see a significant difference between the MEMORY engine and the MyISAM engine is due to disk caching. You have 48GB of RAM and are only loading 2.6GB of data.

The MyISAM engine is writing to 'files' but the OS is using its file caching features to make those file writes actually occur in RAM. Then it will 'lazily' make the actual writes to disk. Since you mentioned 'wc', I'll assume you are using Linux. Read up on the dirty_ratio and dirty_background_ratio kernel settings as a starting point to understanding how that works.

Upvotes: 0

Omesh
Omesh

Reputation: 29121

try setting following variables as well

max_heap_table_size=40GB;
bulk_insert_buffer_size=32MB
read_buffer_size=1M
read_rnd_buffer_size=1M

It may reduce query execution time slightly.

Also CONCURRENT works only with MyISAM table and it slows inserts according to manual refer: Load Data Infile

I think you can't compare speed of insert which is a write operation with wc -l which is read operation as writes are always slower as compared to reads.

Loading 2.6GB data in RAM is going to take considerable amount of time. It mostly depends on the write speed of RAM and IO configuration of your OS.

Hope this helps.

Upvotes: 2

Related Questions