Reputation: 778
I am trying to upload content taken out of a model in Django as a csv file. I don't want to save the file locally, but keep it in the buffer and upload to s3. Currently, this code does not error as is, and uploads the file properly, however, the file is empty.
file_name='some_file.csv'
fields = [list_of_fields]
header = [header_fields]
buff = io.StringIO()
writer = csv.writer(buff, dialect='excel', delimiter=',')
writer.writerow(header)
for value in some_queryset:
row = []
for field in fields:
# filling in the row
writer.writerow(row)
# Upload to s3
client = boto3.client('s3')
bucket = 'some_bucket_name'
date_time = datetime.datetime.now()
date = date_time.date()
time = date_time.time()
dt = '{year}_{month}_{day}__{hour}_{minute}_{second}'.format(
day=date.day,
hour=time.hour,
minute=time.minute,
month=date.month,
second=time.second,
year=date.year,
)
key = 'some_name_{0}.csv'.format(dt)
client.upload_fileobj(buff, bucket, key)
If I take the buffer's content, it is definitely writing it:
content = buff.getvalue()
content.encode('utf-8')
print("content: {0}".format(content)) # prints the csv content
EDIT: I am doing a similar thing with a zip file, created in a buffer:
with zipfile.ZipFile(buff, 'w') as archive:
Writing to the archive (adding pdf files that I am generating), and once I am done, I execute this: buff.seek(0)
which seems to be necessary. If I do a similar thing above, it will error out: Unicode-objects must be encoded before hashing
Upvotes: 18
Views: 14909
Reputation: 252
As explained here using the method put_object rather than upload_fileobj would just do the job right with io.STRINGIO object buffer.
So here, to match the initial example:
client = boto3.client('s3')
client.upload_fileobj(buff2, bucket, key)
would become
client = boto3.client('s3')
client.put_object(Body=buff2, Bucket=bucket, Key=key, ContentType='application/vnd.ms-excel')
Upvotes: 5
Reputation: 741
Okay, disregard my earlier answer, I found the actual problem.
According to the boto3 documentation for the upload_fileobj
function, the first parameter (Fileobj
) needs to implement a read() method that returns bytes:
Fileobj (a file-like object) -- A file-like object to upload. At a minimum, it must implement the read method, and must return bytes.
The read()
function on a _io.StringIO
object returns a string, not bytes. I would suggest swapping the StringIO
object for a BytesIO
object, adding in the necessary encoding and decoding.
Here is a minimal working example. It's not the most efficient solution - the basic idea is to copy the contents over to a second BytesIO
object.
import io
import boto3
import csv
buff = io.StringIO()
writer = csv.writer(buff, dialect='excel', delimiter=',')
writer.writerow(["a", "b", "c"])
buff2 = io.BytesIO(buff.getvalue().encode())
bucket = 'changeme'
key = 'blah.csv'
client = boto3.client('s3')
client.upload_fileobj(buff2, bucket, key)
Upvotes: 25
Reputation: 741
Have you tried calling buff.flush() first? It's possible that your entirely-sensible debugging check (calling getvalue()) is creating the illusion that the buff has been written to, but isn't if you don't call it.
Upvotes: 1