Reputation: 915
Is there a way to track the python process to check where a file is being opened. I have too many files open when I use lsof
on my running process but I'm not sure where they are being opened.
ls /proc/$pid/fd/ | wc -l
I suspect one of the libraries I'm using might have not handled the files properly. Is there a way to isolate exactly which line in my python code the files are being opened?
In my code I work with 3rd party libraries to process thousands of media files and since they are being left open I receive the error
OSError: [Errno 24] Too many open files
after running for a few minutes. Now I know raising the limit of open files is an option but this will just push the error to a later point of time.
Upvotes: 4
Views: 902
Reputation: 4674
Seeing open file handles is easy on Linux:
open_file_handles = os.listdir('/proc/self/fd')
print('open file handles: ' + ', '.join(map(str, open_file_handles)))
You can also use the following on any OS (e.g. Windows, Mac):
import errno, os, resource
open_file_handles = []
for fd in range(resource.getrlimit(resource.RLIMIT_NOFILE)[0]):
try: os.fstat(fd)
except OSError as e:
if e.errno == errno.EBADF: continue
open_file_handles.append(fd)
print('open file handles: ' + ', '.join(map(str, open_file_handles)))
Note: This should always work assuming you're actually (occasionally) running out of file handles. There are usually a max of 256 file handles. But it might take a long time if the max (set by the OS/user policy) is something huge like a billion.
Note also: There will almost always be at least three file handles open for STDIN, STDOUT, and STDERR respectively.
Upvotes: 0
Reputation: 535
The easiest way to trace the open
calls is to use an audit hook in Python. Note that this method would only trace Python open
calls and not the system calls.
Let fdmod.py
be a module file with a single function foo
:
def foo():
return open("/dev/zero", mode="r")
Now the main code in file fd_trace.py
, which is tracing all open
calls and importing fdmod
, is defined follows:
import sys
import inspect
import fdmod
def open_audit_hook(name, *args):
if name == "open":
print(name, *args, "was called:")
caller = inspect.currentframe()
while caller := caller.f_back:
print(f"\tFunction {caller.f_code.co_name} "
f"in {caller.f_code.co_filename}:"
f"{caller.f_lineno}"
)
sys.addaudithook(open_audit_hook)
# main code
fdmod.foo()
with open("/dev/null", "w") as dev_null:
dev_null.write("hi")
fdmod.foo()
When we run fd_trace.py
, we will print the call stack whenever some component is calling open
:
% python3 fd_trace.py
open ('/dev/zero', 'r', 524288) was called:
Function foo in /home/tkrennwa/fdmod.py:2
Function <module> in fd_trace.py:17
open ('/dev/null', 'w', 524865) was called:
Function <module> in fd_trace.py:18
open ('/dev/zero', 'r', 524288) was called:
Function foo in /home/tkrennwa/fdmod.py:2
Function <module> in fd_trace.py:20
See sys.audithook
and inspect.currentframe
for details.
Upvotes: 8
Reputation: 48111
You might get useful information using strace
. This will show all system calls made by a process, including calls to open()
. It will not directly show you where in the Python code those calls are occurring, but you may be able to deduce some information from the context.
Upvotes: 1