someuser
someuser

Reputation: 2299

Perl Module Instantiation + DBI + Forks "Mysql server has gone away"

I have written a perl program that parses records from csv into a db.

The program worked fine but took a long time. So I decided to fork the main parsing process.

After a bit of wrangling with fork it now works well and runs about 4 times faster. The main parsing method is quite database intensive. For interests sake, for each record that is parsed there are the following db calls:

1 - there is a check that the uniquely generated base62 is unique against a baseid map table 2 - There is an archive check to see if the record has changed 3 - The record is inserted into the db

The problem is that I began to get "Mysql has gone away" errors while the parser was being run in forked mode, so after much fiddling I came up with the following mysql config:

#
# * Fine Tuning
#
key_buffer              = 10000M
max_allowed_packet      = 10000M
thread_stack            = 192K
thread_cache_size       = 8
myisam-recover         = BACKUP
max_connections        = 10000
table_cache            = 64
thread_concurrency     = 32
wait_timeout           = 15
tmp_table_size  = 1024M

query_cache_limit       = 2M
#query_cache_size        = 100M
query_cache_size = 0
query_cache_type = 0

That seems to have fixed problems while the parser is running However, I am now getting a "Mysql server has gone away" when the next module is run after the main parser.

The strange thinf is the module causing problems involves a very simple SELECT query on a table with currently only 3 records. Run directly as a test (not after the parser) it works fine.

I tried adding a pause of 4 minutes after the parser module runs - but I get the same error.

I have a main DBConnection.pm model with this: package DBConnection;

use DBI;
use PXConfig;

sub new {
    my $class    = shift;
    ## MYSQL Connection
    my $config = new PXConfig();
    my $host     = $config->val('database', 'host');
    my $database = $config->val('database', 'db');
    my $user     = $config->val('database', 'user');
    my $pw       = $config->val('database', 'password');
    my $dsn      = "DBI:mysql:database=$database;host=$host;"; 
    my $connect2 = DBI->connect( $dsn, $user, $pw,  );
    $connect2->{mysql_auto_reconnect} = 1;
    $connect2->{RaiseError} = 1;
    $connect2->{PrintError} = 1;
    $connect2->{ShowErrorStatement} = 1;
    $connect2->{InactiveDestroy} = 1;

    my $self     = {
        connect => $connect2,
    };
    bless $self, $class;
    return $self;
}

Then all modules, including the forked parser modules, open a connection to the DB using:

package Example;

use DBConnection;

sub new {
    my $class    = shift;
    my $db       = new DBConnection;
    my $connect2 = $db->connect();
    my $self     = {
        connect2 => $connect2,
    };
    bless $self, $class;
    return $self;
}

The question is if I have Module1.pm that calls Module2.pm that calls Module3.pm and each of them instantiates a connection with the DB as shown above (ie in the constructor) then are they using different connections to the database or the same connection?

What I wondered is if the script takes say 6 hours to finish, if the top level call to the db connection is timing out the lower level db connection even though the lower level module is making its 'own' connection.

It is very frustrating trying to find the problem as I can only reproduce the error after running a very long parse process.

Sorry for the long question, thanks in advance to anyone who can give me any ideas.


UPDATE 1:

Here is the actual forking part:

my $fh = Tie::Handle::CSV->new( "$file", header => 1 );
while ( my $part = <$fh> ) {
    if ( $children == $max_threads ) {
        $pid = wait();
        $children--;
    }
    if ( defined( $pid = fork ) ) {
        if ($pid) {
            $children++;
        } else {
            $cfptu = new ThreadedUnit();
            $cfptu->parseThreadedUnit($part, $group_id, $feed_id);
        }
    }
}

And then the ThreadedUnit:

package ThreadedUnit;

use CollisionChecker;
use ArchiveController;
use Filters;
use Try::Tiny;
use MysqlLogger;

sub new {
    my $class    = shift;
    my $db       = new DBConnection;
    my $connect2 = $db->connect();
    my $self     = {
        connect2 => $connect2,
    };
    bless $self, $class;
    return $self;
}

sub parseThreadedUnit {
    my ( $self, $part, $group_id, $feed_id ) = @_;
    my $connect2 = $self->{connect2};

    ## Parsing stuff

    ## DB Update in try -> catch
    exit();
}

So as I understand the DB connection is being called after the forking.

But, as I mentioned above the forked code outlined just above works fine. It is the next module that does not work which is being run from a controller module which just runs through each worker module one at time (the parser being one of them) - the controller module does not create a DB connection in its construct or anywhere else.


Update 2

I forgot to mention that I don't get any errors in the 'problem' module following the parser if I only parse a small number of files and not the full queue.

So it is almost as if the intensive forked parsing and accessing the DB makes it un-available for normal non-forked processes just after it ends for some undetermined time.

The only thing I have noticed when the parser run finishes in Mysql status is the Threads_connected sits around, say, 500 and does not decrease for some time.

Upvotes: 4

Views: 1861

Answers (2)

bohica
bohica

Reputation: 5992

Read dan1111's answer but I suspect you are connecting then forking. When the child completes the DBI connection handle goes out of scope and is closed. As dan1111 says you are better connecting in the child for all the reasons he said. Read about InactiveDestroy and AutoInactiveDestroy in DBI which will help you understand what is going on.

Upvotes: 1

user1919238
user1919238

Reputation:

It depends on how your program is structured, which isn't clear from the question.

If you create the DB connection before you fork, Perl will make a copy of the DB connection object for each process. This would likely cause problems if two processes try to access the database concurrently with the same DB connection.

On the other hand, if you create the DB connections after forking, each module will have its own connection. This should work, but you could have a timeout problem if Module x creates a connection, then waits a long time for a process in Module y to finish, then tries to use the connection.

In summary, here is what you want:

  • Don't have any open connections at the point you fork. Child processes should create their own connections.
  • Only open a connection right before you want to use it. If there is a point in your program when you have to wait, open the connection after the waiting is done.

Upvotes: 2

Related Questions