Reputation: 73
Here is my code.
import boto3
import pandas as pd
import requests
from io import StringIO
campaign_buffer=StringIO()
r = requests.get('https://.... output=csv....')
if r.status_code==200:
r.encoding='utf-8'
request_txt = r.text
campaigns = StringIO(request_txt)
campaigns_pd = pd.read_csv(campaigns, sep=",")
campaigns_pd.columns=campaigns_pd.columns.str.replace(':','_')
campaigns_pd.drop('images_thumb', inplace=True, axis=1)
campaigns_pd.to_csv(campaign_buffer)
else:
print('error')
bucket = 'name'
key = 'folder/test.csv'
client = boto3.client('s3')
client.upload_fileobj(campaign_buffer, bucket, key)
Last line of code caused error: TypeError: Unicode-objects must be encoded before hashing
Any ideas how to solve the problem?
Upvotes: 5
Views: 8677
Reputation: 3200
I ran into a similar issue and would like to update the answer provided by @AKX. A couple of key pieces of context.
s3.resource()
API. I will also include an example for using the s3.client()
API but keep in mind, they provide different methods for performing similar tasks.Here is an example using boto3.resource("s3")
from io import BytesIO
import boto3
import pandas
from pandas import util
df = util.testing.makeMixedDataFrame()
s3_resource = boto3.resource("s3")
campaign_buffer = BytesIO()
df.to_csv(campaign_buffer, sep=",", index=False, mode="wb", encoding="UTF-8")
df.seek(0) # Make sure the stream position is at the beginning!
s3_resource.Object("test-bucket", "test_df_from_resource.csv").put(Body=campaign_buffer.getvalue())
You should then receive a confirmation message that looks similar to the following:
>> {'ResponseMetadata': {'RequestId': 'request-id-value',
'HostId': '###########',
'HTTPStatusCode': 200,
'HTTPHeaders': {'x-amz-id-2': '############',
'x-amz-request-id': '00000',
'date': 'Tue, 31 Aug 2021 00:00:00 GMT',
'x-amz-server-side-encryption': 'value',
'etag': '"xxxx"',
'server': 'AmazonS3',
'content-length': '0'},
'RetryAttempts': 0},
'ETag': '"xxxx"',
'ServerSideEncryption': 'value'}
And here is an example using boto3.client("s3")
from io import BytesIO
import boto3
import pandas
from pandas import util
df = util.testing.makeMixedDataFrame()
s3_client = boto3.client("s3")
campaign_buffer = BytesIO()
df.to_csv(campaign_buffer, sep=",", index=False, mode="wb", encoding="UTF-8")
df.seek(0)
s3_client.upload_fileobj(campaign_buffer, Bucket="test-bucket", Key="test_df_from_client.csv")
Hope seeing both examples helps anyone looking for an easy way to upload a Pandas dataframe to S3.
Upvotes: 2
Reputation: 169174
You're writing to a StringIO()
, which has no intrinsic encoding, and you can't write something that can't be encoded into bytes into S3. To do this without having to re-encode whatever you've written to campaing_buffer
:
campaign_buffer
a BytesIO()
instead of a StringIO()
mode="wb"
and encoding="UTF-8"
to the to_csv
callcampaign_buffer.seek(0)
to rewind the in-memory file before uploadingUpvotes: 6