Reputation: 96966
I am setting up a hash reference containing file handles.
The fourth column of my input file contains an identifier field that I am using to name the file handle's destination:
col1 col2 col3 id-0008 col5
col1 col2 col3 id-0002 col5
col1 col2 col3 id-0001 col5
col1 col2 col3 id-0001 col5
col1 col2 col3 id-0007 col5
...
col1 col2 col3 id-0003 col5
I use GNU core utilities to get a list of the identifiers:
$ cut -f4 myFile | sort | uniq
id-0001
id-0002
...
There can be more than 1024 unique identifiers in this column, and I need to open a file handle for each identifier and put that handle into a hash reference.
my $fhsRef;
my $fileOfInterest = "/foo/bar/fileOfInterest.txt";
openFileHandles($fileOfInterest);
closeFileHandles();
sub openFileHandles {
my ($fn) = @_;
print STDERR "getting set names... (this may take a few moments)\n";
my $resultStr = `cut -f4 $fn | sort | uniq`;
chomp($resultStr);
my @setNames = split("\n", $resultStr);
foreach my $setName (@setNames) {
my $destDir = "$rootDir/$subDir/$setName"; if (! -d $destDir) { mkpath $destDir; }
my $destFn = "$destDir/coordinates.bed";
local *FILE;
print STDERR "opening handle to: $destFn\n";
open (FILE, "> $destFn") or die "could not open handle to $destFn\n$!\n";
$fhsRef->{$setName}->{fh} = *FILE;
$fhsRef->{$setName}->{fn} = $destFn;
}
}
sub closeFileHandles {
foreach my $setName (keys %{$fhsRef}) {
print STDERR "closing handle to: ".$fhsRef->{$setName}->{fn}."\n";
close $fhsRef->{$setName}->{fh};
}
}
The problem is that my code is dying at the equivalent of id-1022
:
opening handle to: /foo/bar/baz/id-0001/coordinates.bed
opening handle to: /foo/bar/baz/id-0002/coordinates.bed
...
opening handle to: /foo/bar/baz/id-1022/coordinates.bed
could not open handle to /foo/bar/baz/id-1022/coordinates.bed
0
6144 at ./process.pl line 66.
Is there an upper limit in Perl to the number of file handles I can open or store in a hash reference? Or have I made another mistake elsewhere?
Upvotes: 2
Views: 2964
Reputation: 169143
There is an OS-imposed limit. Note that stdin/stdout/stderr all count as FDs. The default FD limit on Linux is 1024 per process. This question provides a bit more detail.
Note that the hard limit on most Linuxes I've used is 1024. Check /etc/security/limits.conf
(path might depend on your distro) to see if you can increase it.
You might also consider rewriting the script so that it doesn't need all of these files open at once. Either load all the data in, or provide a lazy-loading mechanism so that you load data when you need it and then close the file.
Upvotes: 6
Reputation: 20875
There is a limit to the number of open files per process in all programming languages.
This is actually a limit imposed by the operating system to prevent malicious (or bogus) programs to consume all the resources of the system, which could cause a freeze of the OS.
If you are using a Linux-based (non-Mac) OS, check out ulimit
and /etc/security/limits.conf
.
ulimit -n 2048
This should work on most Linux distros.
I don't know the configuration for Mac (it differs from Unix on this specific point) and/or Windows.
Edit:
The limit os OS X is defined using the launchctl
tool:
launchctl limit maxfiles 2048 unlimited
Upvotes: 7