John
John

Reputation: 2922

Is it possible to add raw bytes to a TarFile object in python 3?

I'm creating a Python script that does a backup of various files, and data on my server.

It looks something like this:

#!/usr/bin/env python3

import subprocess
import tarfile
import os

DIRS_TO_BACKUP = []
FILES_TO_BACKUP = []
backup_destination = "/tmp/out.tar.gz"

# Code that adds directories to DIRS_TO_BACKUP
DIRS_TO_BACKUP.append("/opt/PROJECT_DIR/...")

# Code that adds files to FILES_TO_BACKUP
FILES_TO_BACKUP.append("/etc/SOME_FILE")

# Code to backup my database
db_table = subprocess.run(['mysqldump', 'my_database'], stdout=subprocess.PIPE).stdout

with tarfile.open(backup_destination, "w:gz") as tar:
    for DIR in DIRS_TO_BACKUP:
        tar.add(DIR, arcname=os.path.basename(DIR))

    for FILE in FILES_TO_BACKUP:
        tar.add(FILE, arcname=os.path.basename(FILE))

    # Code to save db_table (<class 'bytes'>) to tar somehow

Here, db_table are the raw bytes that represent my database. I want to give this data a filename, and save it in my output tar.gz file as a regular file. Is this possible without first saving db_table to the filesystem?

Upvotes: 2

Views: 1986

Answers (2)

kawingkelvin
kawingkelvin

Reputation: 3951

This has worked for me adding an image that was extracted as a String Tensor inside a TFRecord (no extra intermediate file needs to be saved)

 with tarfile.open('image.tar.gz', 'w:gz') as tar:
  for img, filename, *_ in tqdm(ds.take(5)):    # save 5 for testing
    fname = filename.numpy().decode('utf-8')
    
    img = tf.io.encode_jpeg(img, quality=100)
    img_fileobj = BytesIO(img.numpy())

    tarinfo = tarfile.TarInfo(name=fname)
    tarinfo.size = img_fileobj.getbuffer().nbytes
    
    tar.addfile(tarinfo, img_fileobj)

Upvotes: 3

mCoding
mCoding

Reputation: 4849

As you can see in the tarfile docs: https://docs.python.org/3/library/tarfile.html, you can add a file object to a tar using gettarinfo and addfile. Just convert your bytes to a file object using io.BytesIO.

#!/usr/bin/env python3

import subprocess
import tarfile
import os
import io

DIRS_TO_BACKUP = []
FILES_TO_BACKUP = []
backup_destination = "/tmp/out.tar.gz"

# Code that adds directories to DIRS_TO_BACKUP
DIRS_TO_BACKUP.append("/opt/PROJECT_DIR/...")

# Code that adds files to FILES_TO_BACKUP
FILES_TO_BACKUP.append("/etc/SOME_FILE")

# Code to backup my database
db_table = subprocess.run(['mysqldump', 'my_database'], stdout=subprocess.PIPE).stdout
db_fileobj = io.BytesIO(db_table)

with tarfile.open(backup_destination, "w:gz") as tar:
    for DIR in DIRS_TO_BACKUP:
        tar.add(DIR, arcname=os.path.basename(DIR))

    for FILE in FILES_TO_BACKUP:
        tar.add(FILE, arcname=os.path.basename(FILE))

    # Code to save db_table (<class 'bytes'>) to tar somehow
    db_info = tar.gettarinfo(name="database", arcname="database", fileobj=db_fileobj)
    tar.addfile(db_info, fileobj=db_fileobj)

Upvotes: 0

Related Questions