L Xandor
L Xandor

Reputation: 1851

Python boto3 load model tar file from s3 and unpack it

I am using Sagemaker and have a bunch of model.tar.gz files that I need to unpack and load in sklearn. I've been testing using list_objects with delimiter to get to the tar.gz files:

response = s3.list_objects(
Bucket = bucket,
Prefix = 'aleks-weekly/models/',
Delimiter = '.csv'
)


for i in response['Contents']:
    print(i['Key'])

And then I plan to extract with

import tarfile
tf = tarfile.open(model.read())
tf.extractall()

But how do I get to the actual tar.gz file from s3 instead of a some boto3 object?

Upvotes: 2

Views: 6785

Answers (1)

charlesreid1
charlesreid1

Reputation: 4841

You can download objects to files using s3.download_file(). This will make your code look like:

s3 = boto3.client('s3')
bucket = 'my-bukkit'
prefix = 'aleks-weekly/models/'

# List objects matching your criteria
response = s3.list_objects(
    Bucket = bucket,
    Prefix = prefix,
    Delimiter = '.csv'
)

# Iterate over each file found and download it
for i in response['Contents']:
    key = i['Key']
    dest = os.path.join('/tmp',key)
    print("Downloading file",key,"from bucket",bucket)
    s3.download_file(
        Bucket = bucket,
        Key = key,
        Filename = dest
    )

Upvotes: 2

Related Questions