Reputation: 10397
I am doing a large insert by reading a file. The file looks like,
sampletext1
sampletext2
..........
..........
sampletextN
There are millions of lines in the file and it is around 3 GB in size. Reading each line to a variable and then doing a single INSERT won't work because I have only around 2 GB of RAM.
I read line by line and create the mysql INSERT string. When the code has read 5000 lines, I INSERT them to the DB, so there will be 5000 records in an INSERT. The MySQL query in my code ( INSERT IGNORE INTO $curr VALUES $string ) runs as usual until around 25000 lines are read and INSERTed, but then it slows down and takes around 5-10 second just for one INSERTion. I think it decreases linearly as the records increases.
Perl code snippet :
sub StoreToDB {
my $self = shift;;
$self->_doPreliminary();
my $data_struc = $self->_getDATA();
my $file = $data_struc->{DOMAIN_FILE};
my ($count,$cnt,$string,$curr) = (0,0,'',$self->_getTLD() . '_current');
open FH,$file or ( FullLogger($self->_getTLD(),"Cant open $file from StoreToDB : $!\n") and return );
$self->_dbConnect();
while (<FH>) {
chomp;
if ( $cnt == MAX ) {
$self->_dbExecute("INSERT IGNORE INTO $curr VALUES $string");
$count += $cnt;
$cnt = 0;
$string = '';
Logger("Inside StoreToDB, count is : $count ***\n");
}
$string .= "('" . $_ . "')";
++$cnt;
$string = ($cnt != MAX ? $string . ',' : $string . ';');
}#while
close FH;
$self->_dbDisconnect();
return 1;
}#StoreToDB
==============================
DB table details :
mysql> SHOW CREATE TABLE com_current;
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
| com_current | CREATE TABLE `com_current` (
`domain` varchar(60) NOT NULL,
PRIMARY KEY (`domain`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------------+-------------------------------------------------------------------------------------------------------------------------------+
1 row in set (16.60 sec)
mysql>
MySQL status output :
Uptime: 1057 Threads: 2 Questions: 250 Slow queries: 33 Opens: 38 Flush tables: 1 Open tables: 28 Queries per second avg: 0.236
=============================================================== UPdATE :
So far I have tried the below methods, but none of them was better:
1) LOCK TABLES my_table WRITE;
then after inserting, I unlock it,
UNLOCK TABLES;
2) INSERT DELAYED IGNORE INTO $curr VALUES $string
3) LOAD DATA INFILE '$file' IGNORE INTO TABLE $curr
this is currently in progress, but seems worse than the original method.
I don't know whether or not my my.cnf has any issues. So I have pasted it here.
[client]
port = 3306
socket = /tmp/mysql.sock
[mysqld]
datadir = /mnt/mysql/data
port = 3306
socket = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 16M
max_allowed_packet = 1M
table_open_cache = 64
sort_buffer_size = 512K
net_buffer_length = 8K
read_buffer_size = 256K
read_rnd_buffer_size = 512K
myisam_sort_buffer_size = 8M
log-bin=mysql-bin
binlog_format=mixed
server-id = 1
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[myisamchk]
key_buffer_size = 20M
sort_buffer_size = 20M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
Upvotes: 3
Views: 2368
Reputation: 5072
As several people mentioned, LOAD DATA INFILE is going to be the fastest method of getting data into MySQL. It's worthwhile to insert into a fresh table if at all possible. Then, you can:
Original research that I did a while ago:
http://mysqldump.azundris.com/archives/94-LOAD-DATA-INFILE-and-mysqldump.html
The major gotcha is that large LOADs can wreak havoc on your replication.
Upvotes: 0
Reputation: 69294
As others have said, using LOAD DATA INFILE
is almost certainly your best approach.
But there's one obvious issue with your Perl code that you could also try. I don't know how your database interaction is working (_dbExecute
isn't a Perl DBI method) but it looks like it's going to be preparing the SQL statement every time. That's going to be rather inefficient. Instead, you should prepare the statement once and use placeholders in it to insert the variable data.
In DBI terms, you're doing something like this:
foreach (@data) {
my $sth = $dbh->prepare('INSERT INTO SOME_TABLE (COL1) VALUES ($_)');
$sth->execute;
}
When you should be doing something like this:
my $sth = $dbh->prepare('INSERT INTO SOME_TABLE (COL1) VALUES (?)');
foreach (@data) {
$sth->execute($_);
}
You'll almost certainly find that more efficient.
See the documentation on Placeholders and Bind Values for more details.
Upvotes: 4
Reputation: 48367
If you're starting with an empty table, or there are fewer rows in the table than you are inserting, then disabling indexes will speed things up significantly.
ALTER TABLE tbl_name DISABLE KEYS;
INSERT....
INSERT....
INSERT...
...
ALTER TABLE tbl_name ENABLE KEYS;
OTOH if you've already got a lot of data in there, it may actually slow things down.
Batching up the inserts will help with performance (particularly if indexes are enabled), e.g. from the mysql manual:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
(and looking at what you're doing, you might consider using INSERT IGNORE... and sorting the list first).
But one of the best ways to improve performance for bulk inserts is to load the data into a seperate, dedicated table, then use INSERT....SELECT... (using an ORDER BY on the SELECT statement based on the most heavily used index will help to keep it balanced).
Upvotes: 4
Reputation: 2316
Using LOAD DATA INFILE
, as per ypercube's answer, is probably the way to go. As an alternative, you could also start a transaction, then commit it every 500 or so inserts and start a new one. This tends to optimize disk access by storing the transaction in memory and doing the writes all at once.
Upvotes: 2
Reputation: 115560
You can use LOAD DATA INFILE
syntax, instead of sending one row per insert statement.
Upvotes: 5