Reputation: 666
How can I upload a data frame as a zipped csv into S3 bucket without saving it on my local machine first?
I have the connection to that bucket already running using:
self.s3_output = S3(bucket_name='test-bucket', bucket_subfolder='')
Upvotes: 2
Views: 1533
Reputation: 1
This works equally well for zip and gz:
import boto3
import gzip
import pandas as pd
from io import BytesIO, TextIOWrapper
s3_client = boto3.client(
service_name = "s3",
endpoint_url = your_endpoint_url,
aws_access_key_id = your_access_key,
aws_secret_access_key = your_secret_key
# Your file name inside zip
your_filename = "test.csv"
s3_path = f"path/to/your/s3/compressed/file/test.zip"
bucket = "your_bucket"
df = your_df
gz_buffer = BytesIO()
with gzip.GzipFile(
filename = your_filename,
mode = 'w',
fileobj = gz_buffer ) as gz_file:
df.to_csv(TextIOWrapper(gz_file, 'utf8'), index=False)
s3.put_object(
Bucket=bucket, Key=s3_path, Body=gz_buffer.getvalue()
)
Upvotes: 0
Reputation: 4265
We can make a file-like object with BytesIO and zipfile from the standard library.
# 3.7
from io import BytesIO
import zipfile
# .to_csv returns a string when called with no args
s = df.to_csv()
with zipfile.ZipFile(BytesIO(), mode="w",) as z:
z.writestr("df.csv", s)
# upload file here
You'll want to refer to upload_fileobj in order to customize how the upload behaves.
yourclass.s3_output.upload_fileobj(z, ...)
Upvotes: 1