Reputation: 438
My Python application creates a subprocess for AWS CLI S3 upload.
command = 'aws s3 sync /tmp/tmp_dir s3://mybucket/tmp_dir'
# spawn the process
sp = subprocess.Popen(
shlex.split(str(command)),
stdout=subprocess.PIPE, stderr=subprocess.PIPE)
# wait for a while
sp.wait()
out, err = sp.communicate()
if sp.returncode == 0:
logger.info("aws return code: %s", sp.returncode)
logger.info("aws cli stdout `{}`".format(out))
return
# handle error
/tmp/tmp_dir
is ~0.5Gb and contains about 100 files.
Upload process takes ~25 minutes, which is extremely slow.
If I run AWS command directly (without Python) it takes less than 1 minute.
What's wrong? Any help is appreciated.
Upvotes: 1
Views: 1460
Reputation: 1579
I noticed a warning in the documentation about wait()
usage (see below). However, instead of debugging this, why not rewrite it to use the Python SDK instead of shell out to aws cli? Probably you will get better performance and cleaner code.
https://boto3.readthedocs.io/en/latest/guide/s3.html
Warning This will deadlock when using stdout=PIPE and/or stderr=PIPE and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use communicate() to avoid that.
https://docs.python.org/2/library/subprocess.html
edit3:
here is a solution which I just tested and it runs without blocking. There are convenience methods which use wait() or communicate() under the hood, which are easier to use, like check_output:
#!/usr/bin/env python
import subprocess
from subprocess import CalledProcessError
command = ['aws','s3','sync','/tmp/test-sync','s3://bucket-name/test-sync']
try:
result = subprocess.check_output(command)
print(result)
except CalledProcessError as err:
# handle error, check err.returncode which is nonzero.
pass
Upvotes: 1