Reputation: 53
As all of you know when you fork the child gets a copy of everything, including file and network descriptors - man fork
.
In PHP, when you use pcntl_fork all of your connections created with mysql_connect are copied and this is somewhat of a problem - php docs and SO question. Common sense in this situation says close the parent connection, create new and let the child use the old one. But what if said parent needs create many children ever few seconds? In that case you end up creating loads of new connections - one for every bunch of forks.
What does that mean in code:
while (42) {
$db = mysql_connect($host, $user, $pass);
// do some stuff with $db
// ...
foreach ($jobs as $job) {
if (($pid = pcntl_fork()) == -1) {
continue;
} else if ($pid) {
continue;
}
fork_for_job($job);
}
mysql_close($db);
wait_children();
sleep(5);
}
function fork_for_job($job) {
// do something.
// does not use the global $db
// ...
exit(0);
}
Well, I do not want to do that - thats way too many connections to the database. Ideally I would want to be able to achieve behaviour similar to this one:
$db = mysql_connect($host, $user, $pass);
while (42) {
// do some stuff with $db
// ...
foreach ($jobs as $job) {
if (($pid = pcntl_fork()) == -1) {
continue;
} else if ($pid) {
continue;
}
fork_for_job($job);
}
wait_children();
sleep(5);
}
function fork_for_job($job) {
// do something
// does not use the global $db
// ...
exit(0);
}
Do you think it is possible?
Some other things:
Upvotes: 5
Views: 3510
Reputation: 72425
My advice (from personal experience on the same issue) is to close the connection before pcntl_fork()
then open new connections in parent and/or the child process as needed.
If you open a new connection in the parent process then you have to block the SIGCHLD
signal (using pcntl_sigprocmask(SIG_BLOCK, array(SIGCHLD)
). No special care is needed in the children processes (except when they also launch their own children, becoming parents this way.)
SIGCHLD
is a signal that is received by the parent process when one of its children completes.
During the communication with the server, the MySQL client library uses nanosleep()
to suspend the execution of the program for some amounts of time. The sleep()
functions return when the time passes but they also return before the time passes if the process receives a signal while it is suspended.
When nanosleep()
returns because of a signal (i.e. before enough time has passed), the MySQL library gets confused and reports the error "MySQL server has gone away" and the connection cannot be used any more. It is a false alarm, the MySQL server is still there waiting for queries but the client code is fooled by the signal arrived at the wrong moment.
If you are interested in receiving the SIGCHLD
signal then you can block it before running a MySQL query then unblock it again (to avoid it being received during the communication with MySQL server.
Also read this answer and this answer I wrote on similar questions (it's the same information, but with more details and explanation.)
Upvotes: 1
Reputation: 63616
If the child calls exec() or _exit() fairly quickly, you're alright. The problem is if the child sticks around and holds on to copies of your file descriptors.
You could also use posix_spawn if PHP has an API for that. That might work well.
Upvotes: 0
Reputation: 11638
The only thing you could try, is to let your children wait until each other child has finished its job. This way you could use the same database connection (provided there aren't any synchronization issues). But of course you'll have a lot of processes, which is not very good too (in my experience PHP has quite a big memory usage). If having multiple processes accessing the same database connection is not a problem, you could try to make "groups" of processes which share a connection. So you don't have to wait until each job finished (you can clean up when the whole group finished) and you don't have a lot of connections either..
You should ask yourself whether you really need a database connection for your worker processes. Why not let the parent fetch the data and write your results to a file?
If you do need the connection, you should consider using another language for the job. PHPs cli itself is not a "typical" use case (it was added in 4.3) and multiprocessing is more of a hack than a supported feature.
Upvotes: 2