FriedTeaCP
FriedTeaCP

Reputation: 39

Python 3: Extract tar.gz archive without writing to disk

Like the title said, is there a way to extract a tar.gz archive without writing a file to the disk (archive is downloaded from the internet). In bash or any other shell, I can just pipe the output of curl or wget to tar:

curl -L "https://somewebsite.com/file.tar.gz" | tar xzf -

Could I possibly do something like this in python as well?

edit: I'm using urllib to download data. I'm currently doing something like this to download and write to a file:

from urllib.request import urlopen

filename = "/home/bob/file.tar.gz"
url      = "https://website.com/file.tar.gz"

file = open(filename, "wb")
file.write(urlopen(url).read())
file.close

Upvotes: 2

Views: 2718

Answers (2)

FriedTeaCP
FriedTeaCP

Reputation: 39

Using help from kenny's comment, I did what I wanted to do by parsing the data i got from urlopen, use BytesIO, and use that as the fileobj argument for tarfile.open:

from urllib.request import urlopen
import tarfile
from io import BytesIO

r = urlopen("https://url/file.tar.gz")
t = tarfile.open(name=None, fileobj=BytesIO(r.read()))
t.extractall("/somedirectory/")
t.close()

Upvotes: 1

Lactose.Int.Robot
Lactose.Int.Robot

Reputation: 81

without writing TAR file to disk, you could use python subprocess module to run shell commands for you:

import subprocess

# some params
shell_cmd = 'curl -L "https://somewebsite.com/file.tar.gz" | tar xzf -'
i_trust_this_string_cmd = True
throw_error_on_fail = True
timeout_after_seconds = 10 # or None
convert_output_from_bytes_to_string = True
#

# run shell as subprocesses to this one and get results
cp = subprocess.run(
    [shell_cmd],
    shell=i_trust_this_string_cmd,
    check=throw_error_on_fail,
    timeout=timeout_after_seconds,
    text=convert_output_from_bytes_to_string
)

#status_code = cp.returncode

try:
    cp.check_returncode() # triggers exceptions if errors occurred
    print(cp.stdout) # if you want to see the output (text in this case)
except subprocess.CalledProcessError as cpe:
    print(cpe)
except subprocess.TimeoutExpired as te:
    print(te)

If you wanted more control, you could provide a PIPE for say STDOUT, STDERR e.g.

with open('/tmp/stdout.txt', 'w+') as stdout:
    with open('/tmp/stderr.txt', 'w+') as stderr:
        cp = subprocess.run([...], stdout=stdout, stderr=stderr)
        ...

Upvotes: -1

Related Questions