Reputation: 39
Like the title said, is there a way to extract a tar.gz archive without writing a file to the disk (archive is downloaded from the internet). In bash or any other shell, I can just pipe the output of curl or wget to tar:
curl -L "https://somewebsite.com/file.tar.gz" | tar xzf -
Could I possibly do something like this in python as well?
edit: I'm using urllib to download data. I'm currently doing something like this to download and write to a file:
from urllib.request import urlopen
filename = "/home/bob/file.tar.gz"
url = "https://website.com/file.tar.gz"
file = open(filename, "wb")
file.write(urlopen(url).read())
file.close
Upvotes: 2
Views: 2718
Reputation: 39
Using help from kenny's comment, I did what I wanted to do by parsing the data i got from urlopen, use BytesIO, and use that as the fileobj argument for tarfile.open:
from urllib.request import urlopen
import tarfile
from io import BytesIO
r = urlopen("https://url/file.tar.gz")
t = tarfile.open(name=None, fileobj=BytesIO(r.read()))
t.extractall("/somedirectory/")
t.close()
Upvotes: 1
Reputation: 81
without writing TAR file to disk, you could use python subprocess module to run shell commands for you:
import subprocess
# some params
shell_cmd = 'curl -L "https://somewebsite.com/file.tar.gz" | tar xzf -'
i_trust_this_string_cmd = True
throw_error_on_fail = True
timeout_after_seconds = 10 # or None
convert_output_from_bytes_to_string = True
#
# run shell as subprocesses to this one and get results
cp = subprocess.run(
[shell_cmd],
shell=i_trust_this_string_cmd,
check=throw_error_on_fail,
timeout=timeout_after_seconds,
text=convert_output_from_bytes_to_string
)
#status_code = cp.returncode
try:
cp.check_returncode() # triggers exceptions if errors occurred
print(cp.stdout) # if you want to see the output (text in this case)
except subprocess.CalledProcessError as cpe:
print(cpe)
except subprocess.TimeoutExpired as te:
print(te)
If you wanted more control, you could provide a PIPE for say STDOUT, STDERR e.g.
with open('/tmp/stdout.txt', 'w+') as stdout:
with open('/tmp/stderr.txt', 'w+') as stderr:
cp = subprocess.run([...], stdout=stdout, stderr=stderr)
...
Upvotes: -1