Reputation: 11598
I need to run a subprocess pipeline that uses zstandard files (too large to fit in memory) both as their input and output. Consider the following example:
import subprocess
import zstandard
with zstandard.open('a.txt.zst', 'w') as f:
f.write('hello\n')
f_in = zstandard.open('a.txt.zst', 'rb')
f_out = zstandard.open('b.txt.zst', 'wb')
# in reality I'd be running multiple programs here by chaining PIPEs, but first
# reads f_in and last writes to f_out:
subprocess.call(['cat'], stdin=f_in, stdout=f_out)
I'm getting the following error:
Traceback (most recent call last):
File "/tmp/a.py", line 12, in <module>
subprocess.call(['cat'], stdin=f_in, stdout=f_out)
File "/usr/lib/python3.11/subprocess.py", line 389, in call
with Popen(*popenargs, **kwargs) as p:
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/subprocess.py", line 892, in __init__
errread, errwrite) = self._get_handles(stdin, stdout, stderr)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/subprocess.py", line 1661, in _get_handles
p2cread = stdin.fileno()
^^^^^^^^^^^^
AttributeError: 'zstd.ZstdDecompressionReader' object has no attribute 'fileno'
I'm thinking of using PIPEs at both ends and feeding them with threads, but it feels rather fragile. Is there a more idiomatic solution to this problem?
Upvotes: 1
Views: 51
Reputation: 519
The stdin and stdout arguments of subprocess.call
can only take a file-like object if it has a valid file descriptor (as documented here), so that won't work when (de)compression is done by python – at system level there is no file-like containing the decompressed data.
In general pipe'ing from a thread is a good solution (especially if it helps you avoid temporary files), but here it might be simpler to call zstdcat
as yet another program in your chain (for the input, and zstd
with the right options for the output).
Upvotes: 2