Reputation: 13381
I have a web server in Python (2.7) that uses Popen
to delegate some work to a child process:
url_arg = "http://localhost/index.html?someparam=somevalue"
call = ('phantomjs', 'some/phantom/script.js', url_arg)
imageB64data = tempfile.TemporaryFile()
errordata = tempfile.TemporaryFile()
p = Popen(call, stdout=imageB64data, stderr=errordata, stdin=PIPE)
p.communicate(input="")
I am seeing intermittent issues where after some number of these Popen
s have occurred (roughly 64), the process runs out of file descriptors and is unable to function -- it becomes completely unresponsive and all threads seem to block forever if they attempt to open any files or sockets.
(Possibly relevant: the phantomjs
child process loads a URL calls back into the server that spawned it.)
Based on this Python bug report, I believe I need to set close_fds=True
on all Popen
calls from inside my server process in order to mitigate the leaking of file descriptors. However, I am unfamiliar with the machinery around exec
-ing subprocesses and inheritance of file descriptors so much of the Popen
documentation and the notes in the aforementioned bug report are unclear to me.
It sounds like it would actually close all open file descriptors (which includes active request sockets, log file handles, etc.) in my process before executing the subprocess. This sounds like it would be strictly better than leaking the sockets, but would still result in errors.
However, in practice, when I use close_fds=True
during a web request, it seems to work fine and thus far I have been unable to construct a scenario where it actually closes any other request sockets, database requests, etc.
The docs state:
If close_fds is true, all file descriptors except 0, 1 and 2 will be closed before the child process is executed.
So my question is: is it "safe" and "correct" to pass close_fds=True
to Popen
in a multithreaded Python web server? Or should I expect this to have side effects if other requests are doing file/socket IO at the same time?
Upvotes: 3
Views: 12219
Reputation: 181
I suspect that close_fds
solves the problem of file descriptors leaking to subprocesses. Imagine opening a file, and then running some task using subprocess
. Without close_fds
, the file descriptor is copied to the subprocess, so even if the parent process closes the file, the file remains open due to the subprocess. Now, let's say we want to delete the directory with the file in another thread using shutil.rmtree
. On a regular filesystem, this should not be an issue. The directory is just removed as expected. However, when the file resides on NFS, the following happens: First, Python will try to delete the file. Since the file is still in use, it gets renamed to .nfsXXX
instead, where XXX
is a long hexadecimal number. Next, Python will try to delete the directory, but that has become impossible because the .nfsXXX
file still resides in it.
Upvotes: 2
Reputation: 13381
I tried the following test with the subprocess32
backport of Python 3.2/3.3's subprocess
:
import tempfile
import subprocess32 as subprocess
fp = open('test.txt', 'w')
fp.write("some stuff")
echoed = tempfile.TemporaryFile()
p = subprocess.Popen(("echo", "this", "stuff"), stdout=echoed, close_fds=True)
p.wait()
echoed.seek(0)
fp.write("whatevs")
fp.write(echoed.read())
fp.close()
and I got the expected result of some stuffwhatevsecho this stuff
in test.txt
.
So it appears that the meaning of close
in close_fds
does not mean that open files (sockets, etc.) in the parent process will be unusable after executing a child process.
Also worth noting: subprocess32
defaults close_fds=True
on POSIX systems, AFAICT. This implies to me that it is not as dangerous as it sounds.
Upvotes: 3