user2691659
user2691659

Reputation:

How to write data to a file for loading into MySQL database?

I am having some sort of I/O bottleneck inserting parsed data into a MySQL database. So instead of parsing the file and inserting each record into the database individually, I would like to write the records to a file (like a CSV) and then I can load the file into the database manually.

How can I approach this using Perl? Below is the code that currently inserts the data into the database, but I would like to write the data to a file instead, in a format that makes inserting it into MySQL later as painless as possible.

#INSERT INTO ANNOUNCE TABLE
foreach my $au (@ANNOUNCED) {

    my $val=$au->[0];
    my $IP = $prefix->ip();
    my $subnetmask = $prefix->mask();


    $Announce_update->execute($IP,$subnetmask,$UpdateKey);

}

Upvotes: 0

Views: 1144

Answers (2)

G. Cito
G. Cito

Reputation: 6378

Probably using Text::CSV will make your life easier at this point. Install that module if it is not already on your system and then add this line to your script:

use Text::CSV ;

Comment out the database insertion bits in your code:

 # $dbh->prepare("SET FOREIGN_KEY_CHECKS = 0;")->execute();
 # $Announce_update->execute($IP,$subnetmask,$UpdateKey);

and then try grabbing the data you want:

 my @csvdata = ($IP,$subnetmask,$UpdateKey); # assuming data is in these vars 
                                              # and they aren't references used 
                                              # by $Announce_update somehow

then write it out to the CSV files somewhere (cribbed from the documentation) which you will load "manually" (by which I assume you mean using the MySQL console or CLI tool).

  my $csv = Text::CSV->new ( { binary => 1 } )  
  open $fh, ">:encoding(utf8)", "csvdata.csv" or die ;
  $csv->print ($fh, $_) for @csvdata;

Hope this helps - though it is a bit short on hand holding. Even if it does work I'm not sure you are making your life easier by doing this ;-)

Upvotes: 1

ThisSuitIsBlackNot
ThisSuitIsBlackNot

Reputation: 24063

To load data from an external file, MySQL provides the LOAD DATA INFILE command. LOAD DATA INFILE is fairly flexible when it comes to input file formats, allowing you to specify the delimiter, EOL character, whether fields are quoted, etc. It works great in concert with Text::CSV, which you can use to output a delimited file containing your data.

First, write your data to a file:

use Text::CSV;

my $csv = Text::CSV->new({ eol => "\n" }) or die Text::CSV->error_diag();

my $infile = "/path/to/file";
open my $fh, ">", $infile or die $!;

for my $i (0..$#ANNOUNCED)
{
   # Don't end last line with '\n' or we'll get a garbage row when we load
   # to the database
   $csv->eol(undef) if $i == $#ANNOUNCED;

   # Generate the data to insert for this row

   # Write to file
   $csv->print($fh, [ $IP, $subnetmask, $UpdateKey ]);
}

# Close file handle to flush the buffer
close $fh;

Note that you must close the file handle after writing to it or MySQL may not get all of the data.

Next, load the file:

my $query = "LOAD DATA LOCAL INFILE '$infile' INTO TABLE table
             FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
             LINES TERMINATED BY '\\n'";

eval {
   # $dbh is an already open database handle
   my $rows_inserted = $dbh->do($query);
};
die $@ if $@;

The LOCAL keyword affects whether the file is located on the MySQL server or not. If you're loading to a database located on localhost, you can leave off the LOCAL keyword, counter-intuitive as that may sound (with LOCAL, the client sends a copy of the file to the server's temporary directory and the server reads it from there, so it is local to the server). Without the LOCAL keyword, the file must be readable to all (i.e. permissions of at least 0644 on *nix, since you also have to write to it). MySQL will also look in different places for relative paths depending on whether LOCAL is used or not; see the documentation for details.

The above query assumes that your table only has three columns. If it has more, you need to specify which columns you're inserting data for, in the order the appear in the CSV, e.g.:

my $query = "LOAD DATA LOCAL INFILE '$infile' INTO TABLE table
             FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"'
             LINES TERMINATED BY '\\n'
             (ip, subnetmask, updatekey)";

Compound inserts

LOAD DATA INFILE is by far the fastest method if you're inserting a large number of rows at once. For smaller inserts, however, the additional I/O overhead of writing to and reading from a temporary file (especially if you use the LOCAL option) makes it slower than using compound inserts like this one:

# Inserts 3 rows at once
INSERT INTO table VALUES (foo, foo, foo), (bar, bar, bar), (baz, baz, baz)

There is a magic number of rows for which LOAD DATA INFILE becomes faster than compound inserts. For my database and application, I did some profiling and found that this number was on the order of 100, but it is almost certainly different for you. I wrote a function to select the most efficient method depending on the number of rows of data to be inserted:

sub insert_rows {
    my $data = shift; # Reference to an AoA

    my $num_rows = $#{ $data };

    if ($num_rows < 100) {
        # Generate compound insert statement
    }
    else {
        # LOAD DATA INFILE
    }
}

Be aware that the default value of max_allowed_packet is only 1MB in MySQL 5.7. If your compound insert statement exceeds this, you will get a Packet too large error and the insert will fail. You can adjust this up to a maximum of 1GB, but by that point you will have probably reached the threshold where LOAD DATA INFILE is more efficient anyway.

Upvotes: 1

Related Questions