Reputation: 4842
I have an AWS Lambda function which queries API and creates a dataframe, I want to write this file to an S3 bucket, I am using:
import pandas as pd
import s3fs
df.to_csv('s3.console.aws.amazon.com/s3/buckets/info/test.csv', index=False)
I am getting an error:
No such file or directory: 's3.console.aws.amazon.com/s3/buckets/info/test.csv'
But that directory exists, because I am reading files from there. What is the problem here?
I've read the previous files like this:
s3_client = boto3.client('s3')
s3_client.download_file('info', 'secrets.json', '/tmp/secrets.json')
How can I upload the whole dataframe to an S3 bucket?
Upvotes: 14
Views: 38566
Reputation: 5599
You can use AWS SDK for Pandas, a library that extends Pandas to work smoothly with AWS data stores.
import awswrangler as wr
df = wr.s3.read_csv("s3://bucket/file.csv")
The library is available in AWS Lambda with the addition of the layer called AWSSDKPandas-Python
.
Upvotes: 2
Reputation: 6355
You can use boto3 package also for storing data to S3:
from io import StringIO # python3 (or BytesIO for python2)
import boto3
bucket = 'info' # already created on S3
csv_buffer = StringIO()
df.to_csv(csv_buffer)
s3_resource = boto3.resource('s3')
s3_resource.Object(bucket, 'df.csv').put(Body=csv_buffer.getvalue())
Upvotes: 35
Reputation: 2137
This
"s3.console.aws.amazon.com/s3/buckets/info/test.csv"
is not a S3 URI, you need to pass a S3 URI to save to s3. Moreover, you do not need to import s3fs (you only need it installed),
Just try:
import pandas as pd
df = pd.DataFrame()
# df.to_csv("s3://<bucket_name>/<obj_key>")
# In your case
df.to_csv("s3://info/test.csv")
NOTE: You need to create bucket on aws s3 first.
Upvotes: 25