camelBack
camelBack

Reputation: 778

CSV file upload from buffer to S3

I am trying to upload content taken out of a model in Django as a csv file. I don't want to save the file locally, but keep it in the buffer and upload to s3. Currently, this code does not error as is, and uploads the file properly, however, the file is empty.

file_name='some_file.csv'
fields = [list_of_fields]
header = [header_fields]

buff =  io.StringIO()
writer = csv.writer(buff, dialect='excel', delimiter=',')
writer.writerow(header)
for value in some_queryset:
    row = []
    for field in fields:
        # filling in the row
    writer.writerow(row)

# Upload to s3
client = boto3.client('s3')
bucket = 'some_bucket_name'
date_time = datetime.datetime.now()
date = date_time.date()
time = date_time.time()
dt = '{year}_{month}_{day}__{hour}_{minute}_{second}'.format(
    day=date.day,
    hour=time.hour,
    minute=time.minute,
    month=date.month,
    second=time.second,
    year=date.year,
)
key = 'some_name_{0}.csv'.format(dt)

client.upload_fileobj(buff, bucket, key)

If I take the buffer's content, it is definitely writing it:

content = buff.getvalue()
content.encode('utf-8')
print("content: {0}".format(content)) # prints the csv content

EDIT: I am doing a similar thing with a zip file, created in a buffer:

with zipfile.ZipFile(buff, 'w') as archive:

Writing to the archive (adding pdf files that I am generating), and once I am done, I execute this: buff.seek(0) which seems to be necessary. If I do a similar thing above, it will error out: Unicode-objects must be encoded before hashing

Upvotes: 18

Views: 14909

Answers (4)

Pascal Louis-Marie
Pascal Louis-Marie

Reputation: 252

As explained here using the method put_object rather than upload_fileobj would just do the job right with io.STRINGIO object buffer.

So here, to match the initial example:

client = boto3.client('s3')
client.upload_fileobj(buff2, bucket, key)

would become

client = boto3.client('s3')
client.put_object(Body=buff2, Bucket=bucket, Key=key, ContentType='application/vnd.ms-excel')

Upvotes: 5

Thomite
Thomite

Reputation: 741

Okay, disregard my earlier answer, I found the actual problem.

According to the boto3 documentation for the upload_fileobj function, the first parameter (Fileobj) needs to implement a read() method that returns bytes:

Fileobj (a file-like object) -- A file-like object to upload. At a minimum, it must implement the read method, and must return bytes.

The read() function on a _io.StringIO object returns a string, not bytes. I would suggest swapping the StringIO object for a BytesIO object, adding in the necessary encoding and decoding.

Here is a minimal working example. It's not the most efficient solution - the basic idea is to copy the contents over to a second BytesIO object.

import io
import boto3
import csv

buff = io.StringIO()

writer = csv.writer(buff, dialect='excel', delimiter=',')
writer.writerow(["a", "b", "c"])

buff2 = io.BytesIO(buff.getvalue().encode())

bucket = 'changeme'
key = 'blah.csv'

client = boto3.client('s3')
client.upload_fileobj(buff2, bucket, key)

Upvotes: 25

khc
khc

Reputation: 354

You can use something like goofys to redirect output to S3.

Upvotes: 0

Thomite
Thomite

Reputation: 741

Have you tried calling buff.flush() first? It's possible that your entirely-sensible debugging check (calling getvalue()) is creating the illusion that the buff has been written to, but isn't if you don't call it.

Upvotes: 1

Related Questions