Rafalik
Rafalik

Reputation: 73

Saving file like object to s3 i get error: Unicode-objects must be encoded before hashing

Here is my code.

import boto3
import pandas as pd
import requests
from io import StringIO

campaign_buffer=StringIO()

r = requests.get('https://.... output=csv....')

if r.status_code==200:
    r.encoding='utf-8'
    request_txt = r.text
    campaigns = StringIO(request_txt)
    campaigns_pd = pd.read_csv(campaigns, sep=",")
    campaigns_pd.columns=campaigns_pd.columns.str.replace(':','_')
    campaigns_pd.drop('images_thumb', inplace=True, axis=1)
    campaigns_pd.to_csv(campaign_buffer)
else:
    print('error')

bucket = 'name'
key = 'folder/test.csv'

client = boto3.client('s3')
client.upload_fileobj(campaign_buffer, bucket, key)

Last line of code caused error: TypeError: Unicode-objects must be encoded before hashing

Any ideas how to solve the problem?

Upvotes: 5

Views: 8677

Answers (2)

Nathan
Nathan

Reputation: 3200

I ran into a similar issue and would like to update the answer provided by @AKX. A couple of key pieces of context.

  • You will need to use a pandas version that includes the fix "support binary file handles in to_csv". The example below was tested on Pandas 1.3.2, Python 3.8, and boto3 1.17.106.
  • The answer by @AKX and the following example use the high level s3.resource() API. I will also include an example for using the s3.client() API but keep in mind, they provide different methods for performing similar tasks.

Here is an example using boto3.resource("s3")

from io import BytesIO
import boto3
import pandas
from pandas import util
df = util.testing.makeMixedDataFrame()
s3_resource = boto3.resource("s3")
campaign_buffer = BytesIO()
df.to_csv(campaign_buffer, sep=",", index=False, mode="wb", encoding="UTF-8")
df.seek(0)  # Make sure the stream position is at the beginning!
s3_resource.Object("test-bucket", "test_df_from_resource.csv").put(Body=campaign_buffer.getvalue())

You should then receive a confirmation message that looks similar to the following:

>> {'ResponseMetadata': {'RequestId': 'request-id-value',
'HostId': '###########',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': '############',
   'x-amz-request-id': '00000',
   'date': 'Tue, 31 Aug 2021 00:00:00 GMT',
   'x-amz-server-side-encryption': 'value',
   'etag': '"xxxx"',
   'server': 'AmazonS3',
   'content-length': '0'},
  'RetryAttempts': 0},
 'ETag': '"xxxx"',
 'ServerSideEncryption': 'value'}

And here is an example using boto3.client("s3")

from io import BytesIO
import boto3
import pandas
from pandas import util
df = util.testing.makeMixedDataFrame()
s3_client = boto3.client("s3")
campaign_buffer = BytesIO()
df.to_csv(campaign_buffer, sep=",", index=False, mode="wb", encoding="UTF-8")
df.seek(0)
s3_client.upload_fileobj(campaign_buffer, Bucket="test-bucket", Key="test_df_from_client.csv")

Hope seeing both examples helps anyone looking for an easy way to upload a Pandas dataframe to S3.

Upvotes: 2

AKX
AKX

Reputation: 169174

You're writing to a StringIO(), which has no intrinsic encoding, and you can't write something that can't be encoded into bytes into S3. To do this without having to re-encode whatever you've written to campaing_buffer:

  1. Make your campaign_buffer a BytesIO() instead of a StringIO()
  2. Add mode="wb" and encoding="UTF-8" to the to_csv call
  3. Do campaign_buffer.seek(0) to rewind the in-memory file before uploading

Upvotes: 6

Related Questions