Reputation:
I am having some sort of I/O bottleneck inserting parsed data into a MySQL database. So instead of parsing the file and inserting each record into the database individually, I would like to write the records to a file (like a CSV) and then I can load the file into the database manually.
How can I approach this using Perl? Below is the code that currently inserts the data into the database, but I would like to write the data to a file instead, in a format that makes inserting it into MySQL later as painless as possible.
#INSERT INTO ANNOUNCE TABLE
foreach my $au (@ANNOUNCED) {
my $val=$au->[0];
my $IP = $prefix->ip();
my $subnetmask = $prefix->mask();
$Announce_update->execute($IP,$subnetmask,$UpdateKey);
}
Upvotes: 0
Views: 1144
Reputation: 6378
Probably using Text::CSV will make your life easier at this point. Install that module if it is not already on your system and then add this line to your script:
use Text::CSV ;
Comment out the database insertion bits in your code:
# $dbh->prepare("SET FOREIGN_KEY_CHECKS = 0;")->execute();
# $Announce_update->execute($IP,$subnetmask,$UpdateKey);
and then try grabbing the data you want:
my @csvdata = ($IP,$subnetmask,$UpdateKey); # assuming data is in these vars
# and they aren't references used
# by $Announce_update somehow
then write it out to the CSV files somewhere (cribbed from the documentation) which you will load "manually" (by which I assume you mean using the MySQL console or CLI tool).
my $csv = Text::CSV->new ( { binary => 1 } )
open $fh, ">:encoding(utf8)", "csvdata.csv" or die ;
$csv->print ($fh, $_) for @csvdata;
Hope this helps - though it is a bit short on hand holding. Even if it does work I'm not sure you are making your life easier by doing this ;-)
Upvotes: 1
Reputation: 24063
To load data from an external file, MySQL provides the LOAD DATA INFILE
command. LOAD DATA INFILE
is fairly flexible when it comes to input file formats, allowing you to specify the delimiter, EOL character, whether fields are quoted, etc. It works great in concert with Text::CSV, which you can use to output a delimited file containing your data.
First, write your data to a file:
use Text::CSV;
my $csv = Text::CSV->new({ eol => "\n" }) or die Text::CSV->error_diag();
my $infile = "/path/to/file";
open my $fh, ">", $infile or die $!;
for my $i (0..$#ANNOUNCED)
{
# Don't end last line with '\n' or we'll get a garbage row when we load
# to the database
$csv->eol(undef) if $i == $#ANNOUNCED;
# Generate the data to insert for this row
# Write to file
$csv->print($fh, [ $IP, $subnetmask, $UpdateKey ]);
}
# Close file handle to flush the buffer
close $fh;
Note that you must close the file handle after writing to it or MySQL may not get all of the data.
Next, load the file:
my $query = "LOAD DATA LOCAL INFILE '$infile' INTO TABLE table
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\\n'";
eval {
# $dbh is an already open database handle
my $rows_inserted = $dbh->do($query);
};
die $@ if $@;
The LOCAL
keyword affects whether the file is located on the MySQL server or not. If you're loading to a database located on localhost, you can leave off the LOCAL
keyword, counter-intuitive as that may sound (with LOCAL
, the client sends a copy of the file to the server's temporary directory and the server reads it from there, so it is local to the server). Without the LOCAL
keyword, the file must be readable to all (i.e. permissions of at least 0644
on *nix, since you also have to write to it). MySQL will also look in different places for relative paths depending on whether LOCAL
is used or not; see the documentation for details.
The above query assumes that your table only has three columns. If it has more, you need to specify which columns you're inserting data for, in the order the appear in the CSV, e.g.:
my $query = "LOAD DATA LOCAL INFILE '$infile' INTO TABLE table
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
LINES TERMINATED BY '\\n'
(ip, subnetmask, updatekey)";
LOAD DATA INFILE
is by far the fastest method if you're inserting a large number of rows at once. For smaller inserts, however, the additional I/O overhead of writing to and reading from a temporary file (especially if you use the LOCAL
option) makes it slower than using compound inserts like this one:
# Inserts 3 rows at once
INSERT INTO table VALUES (foo, foo, foo), (bar, bar, bar), (baz, baz, baz)
There is a magic number of rows for which LOAD DATA INFILE
becomes faster than compound inserts. For my database and application, I did some profiling and found that this number was on the order of 100, but it is almost certainly different for you. I wrote a function to select the most efficient method depending on the number of rows of data to be inserted:
sub insert_rows {
my $data = shift; # Reference to an AoA
my $num_rows = $#{ $data };
if ($num_rows < 100) {
# Generate compound insert statement
}
else {
# LOAD DATA INFILE
}
}
Be aware that the default value of max_allowed_packet
is only 1MB in MySQL 5.7. If your compound insert statement exceeds this, you will get a Packet too large
error and the insert will fail. You can adjust this up to a maximum of 1GB, but by that point you will have probably reached the threshold where LOAD DATA INFILE
is more efficient anyway.
Upvotes: 1