Reputation:
I am currently running sed in a python subprocess, however I am receiving the error:
"OSError: [Errno 7] Argument list too long: 'sed'"
The Python code is:
subprocess.run(['sed', '-i',
'-e', 's/#/pau/g',
*glob.glob('label_POS/label_phone_align/dump/*')], check=True)
Where the /dump/ directory has ~13,000 files in it. I have been told that I need to run the command for subsets of the argument list, but I'm can't find how to do that.
Upvotes: 1
Views: 245
Reputation: 189467
Please scroll down to the end of this answer for the solution I recommend for your specific problem. There's a bit of background here for context and/or future visitors grappling with other "argument list too long" errors.
The exec()
system call has a size limit; you cannot pass more than ARG_MAX
bytes as arguments to a process, where this system constant's value can usually be queried with the getconf ARG_MAX
command on modern systems.
import glob
import subprocess
arg_max = subprocess.run(['getconf', 'ARG_MAX'],
text=True, check=True, capture_output=True
).stdout.strip()
arg_max = int(arg_max)
cmd = ['sed', '-i', '-e', 's/#/pau/g']
files = glob.glob('label_POS/label_phone_align/dump/*')
while files:
base = sum(len(x) for x in cmd) + len(cmd)
for l in range(len(files)):
base += 1 + len(files[l])
if base > arg_max:
l -= 1
break
subprocess.run(cmd + files[0:l+1], check=True)
files = files[l+1:]
Of course, the xargs
command already does exactly this for you.
import subprocess
import glob
subprocess.run(
['xargs', '-r', '-0', 'sed', '-i', '-e', 's/#/pau/g'],
input=b'\0'.join([x.encode() for x in glob.glob('label_POS/label_phone_align/dump/*') + ['']]),
check=True)
Simply removing the long path might be enough in you case, though. You are repeating label_POS/label_phone_align/dump/
in front of every file name in the argument array.
import glob
import subprocess
import os
path = 'label_POS/label_phone_align/dump'
files = [os.path.basename(file)
for file in glob.glob(os.path.join(path, '*'))]
subprocess.run(
['sed', '-i', '-e', 's/#/pau/g', *files],
cwd=path, check=True)
Eventually, perhaps prefer a pure Python solution.
import glob
import fileinput
for line in fileinput.input(glob.glob('label_POS/label_phone_align/dump/*'), inplace=True):
print(line.replace('#', 'pau'))
Upvotes: 0
Reputation: 24691
Whoever told you that probably meant that you need to split up the glob and run multiple separate commands:
files = glob.glob('label_POS/label_phone_align/dump/*')
i = 0
scale = 100
# process in units of 100 filenames until we have them all
while scale*i < len(files):
subprocess.run(['sed', '-i',
'-e', 's/#/pau/g',
*files[scale*i:scale*(i+1)]], check=True)
i += 1
and then amalgamate all that output however you need, after the fact. I don't know how many inputs the sed
command can accept from the command line, but it's apparently less than 13,000. You can keep changing scale
until it doesn't error.
Upvotes: 1