Paul Anderson
Paul Anderson

Reputation: 21

What could cause gunzip/gzip to hang in Perl threads?

The script I am writing has multiple threads. Each of these threads is responsible for a considerable amount of IO. I am using Perl 5.8.3.

The following file processing is necessary:
1) Open a gzipped file to read the contents into some variable.
2) Close the input stream from gzip/gunzip.
3) Perform arbitrary calculations given the data in the variable.

I have tried couple of different ways of gunzipping a file to get the file contents:

$someVariable = `gunzip -c /path/to/file.gz`;

AND

$someVariable = "";
open(my $INPUT,'gunzip -c /path/to/file.gz|');

while(my $line = <$INPUT>){
    $someVariable .= $line;
}
close($INPUT);

The process is typically expected to take a number of hours in general, however gunzip seems to get stuck on random files. There is nothing particularly special about the files being read. The ones that get stuck are different every time and there are times where no files get stuck at all (proccesing the same batch). This is what the process information looks like (using ps aux | grep gunzip):

username 12345  0.0  0.0   1752   400 pts/3    S    May27   0:00 gunzip -c /path/to/file.gz

I'm open to suggestions and questions regarding the program. I can only post generic portions of code. Additionally, I have already read this post (How to deal with multiple threads in perl which turn into zombie). I seem to be having a similar issue that 'Gahoo' was, however there was no solution posted (his final comment indicated something related to the issue I am having).

Thanks!
Paul

Upvotes: 0

Views: 1098

Answers (3)

Matt Amrein
Matt Amrein

Reputation: 1

I ran into this same problem running on a much later version (5.20.1) on Linux. While I did not find a definitive solution, I did come up with a workaround, which is to use a system() call for gunzip and redirect the output to a temporary file (I appended the temporary file with the thread #), then read in that temporary file using a standard open() call. Based on this, it seems the issue is with the use of stdout when using the gzip methods above. This workaround is far from ideal and can probably be improved to be more robust, but it is acceptable in certain situations. Example:

system("gunzip -c $filename > tmp_file".threads->tid());
open FOO, "<", "tmp_file".threads->tid() or die $!;
$output = <FOO>;
close FOO;

Upvotes: 0

Phil Gabardo
Phil Gabardo

Reputation: 142

I've experienced this issue when gzipping in Perl threads that were dispatched in Windows using Cygwin. However, this issue does not appear when gzipping in Perl threads that were dispatched in Linux. This leads me to believe that this is a Cygwin bug. You have two options to resolve this:

  1. Run your script in Linux.
  2. Use IO::Uncompress::Gunzip (http://perldoc.perl.org/IO/Uncompress/Gunzip.html) instead of gzip/gunzip. This implementation does not hang, but it is much, much slower.

Upvotes: 0

ikegami
ikegami

Reputation: 385877

Assuming you're correct that it's backticks or open -|, then it's a bug in Perl, and it's probably one of the many thread bugs that have been fixed since decade-old 5.8.3.

Upvotes: 1

Related Questions