Reputation: 943
I am seing a lot of too many open files exceptions in the execution of my program. Typically those occur in the following form:
org.jboss.netty.channel.ChannelException: Failed to create a selector.
...
Caused by: java.io.IOException: Too many open files
However, those are not the only exceptions. I have observed similar ones (caused by "too many open files") but those are much less frequent.
Strangely enough i have set the limit of open files of the screen session (from where i launch my programs) as 1M:
root@s11:~/fabiim-cbench# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
**open files (-n) 1000000**
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Moreover, as observed by the output of lsof -p
I see no more that 1111 open files (sockets, pipes, files) before the exceptions are thrown.
Question: What is wrong and/or how can i dig deeper into this problem.
Extra: I am currently integrating Floodlight with bft-smart. In a nutshell the floodlight process is the one crashing with too much open files exceptions when executing a stress test launched by a benchmark program. This benchmark program will maintain 64 tcp connections to the floodlight process which in turn should maintain at least 64 * 3 tcp connections to the bft-smart replicas. Both programs use netty to manage these connections.
Upvotes: 2
Views: 6356
Reputation: 34873
First thing to check—can you run ulimit
from inside your Java process to make sure that the file limit is the same inside? Code like this should work:
InputStream is = Runtime.getRuntime().exec(new String[] {"bash", "-c", "ulimit -a"}).getInputStream();
int c;
while ((c = is.read()) != -1) {
System.out.write(c);
}
If the limit still shows 1 million, well, you’re up for some hard debugging.
Here are a couple of things that I would look into if I had to debug this—
Are you running out of tcp
port numbers? What does netstat -an
show when you hit this error?
Use strace
to find out exactly what system call with what parameters is causing this error to be thrown. EMFILE
is a return value of 24.
The “Too many open files” EMFILE
error can actually be thrown by a number of different system calls for a number of different reasons:
$ cd /usr/share/man/man2
$ zgrep -A 2 EMFILE *
accept.2.gz:.B EMFILE
accept.2.gz:The per-process limit of open file descriptors has been reached.
accept.2.gz:.TP
accept.2.gz:--
accept.2.gz:.\" EAGAIN, EBADF, ECONNABORTED, EINTR, EINVAL, EMFILE,
accept.2.gz:.\" ENFILE, ENOBUFS, ENOMEM, ENOTSOCK, EOPNOTSUPP, EPROTO, EWOULDBLOCK.
accept.2.gz:.\" In addition, SUSv2 documents EFAULT and ENOSR.
dup.2.gz:.B EMFILE
dup.2.gz:The process already has the maximum number of file
dup.2.gz:descriptors open and tried to open a new one.
epoll_create.2.gz:.B EMFILE
epoll_create.2.gz:The per-user limit on the number of epoll instances imposed by
epoll_create.2.gz:.I /proc/sys/fs/epoll/max_user_instances
eventfd.2.gz:.B EMFILE
eventfd.2.gz:The per-process limit on open file descriptors has been reached.
eventfd.2.gz:.TP
execve.2.gz:.B EMFILE
execve.2.gz:The process has the maximum number of files open.
execve.2.gz:.TP
execve.2.gz:--
execve.2.gz:.\" document ETXTBSY, EPERM, EFAULT, ELOOP, EIO, ENFILE, EMFILE, EINVAL,
execve.2.gz:.\" EISDIR or ELIBBAD error conditions.
execve.2.gz:.SH NOTES
fcntl.2.gz:.B EMFILE
fcntl.2.gz:For
fcntl.2.gz:.BR F_DUPFD ,
getrlimit.2.gz:.BR EMFILE .
getrlimit.2.gz:(Historically, this limit was named
getrlimit.2.gz:.B RLIMIT_OFILE
inotify_init.2.gz:.B EMFILE
inotify_init.2.gz:The user limit on the total number of inotify instances has been reached.
inotify_init.2.gz:.TP
mmap.2.gz:.\" SUSv2 documents additional error codes EMFILE and EOVERFLOW.
mmap.2.gz:.SH AVAILABILITY
mmap.2.gz:On POSIX systems on which
mount.2.gz:.B EMFILE
mount.2.gz:(In case no block device is required:)
mount.2.gz:Table of dummy devices is full.
open.2.gz:.B EMFILE
open.2.gz:The process already has the maximum number of files open.
open.2.gz:.TP
pipe.2.gz:.B EMFILE
pipe.2.gz:Too many file descriptors are in use by the process.
pipe.2.gz:.TP
shmop.2.gz:.\" SVr4 documents an additional error condition EMFILE.
shmop.2.gz:
shmop.2.gz:In SVID 3 (or perhaps earlier)
signalfd.2.gz:.B EMFILE
signalfd.2.gz:The per-process limit of open file descriptors has been reached.
signalfd.2.gz:.TP
socket.2.gz:.B EMFILE
socket.2.gz:Process file table overflow.
socket.2.gz:.TP
socketpair.2.gz:.B EMFILE
socketpair.2.gz:Too many descriptors are in use by this process.
socketpair.2.gz:.TP
spu_create.2.gz:.B EMFILE
spu_create.2.gz:The process has reached its maximum open files limit.
spu_create.2.gz:.TP
timerfd_create.2.gz:.B EMFILE
timerfd_create.2.gz:The per-process limit of open file descriptors has been reached.
timerfd_create.2.gz:.TP
truncate.2.gz:.\" error conditions EMFILE, EMULTIHP, ENFILE, ENOLINK. SVr4 documents for
truncate.2.gz:.\" .BR ftruncate ()
truncate.2.gz:.\" an additional EAGAIN error condition.
If you check out all these manpages by hand, you may find something interesting. For example, I think it’s interesting that epoll_create
, the underlying system call that is used by NIO channels, will return EMFILE
“Too many open files” if
The per-user limit on the number of epoll instances imposed by /proc/sys/fs/epoll/max_user_instances was encountered. See epoll(7) for further details.
Now that filename doesn’t actually exist on my system, but there are some limits defined in files in /proc/sys/fs/epoll
and /proc/sys/fs/inotify
that you might be hitting, especially if you’re running multiple instances of the same test on the same machine. Figuring out if that’s the case is a chore in itself—you could start by checking syslog for any messages…
Good luck!
Upvotes: 4