coderguy
coderguy

Reputation: 47

Df.to_csv works with S3 bucket, but df.to_excel does not

I will try to phrase this the best I can. I want to write/append to an xls/xlsx file from Sagemaker that is in an S3 bucket. There is an empty excel sheet for each file type (csv,xls,xlsx) in the S3 bucket that I upload prior to. I am able to write/append a df to the empty csv file in the S3 bucket no problem. But it does not work for xls/xlsx. Here is the code I am using for the csv:

df.to_csv('s3://bucket_name/temp/Database.csv', index=False, mode = 'w', header = False)

Here is the code I am using for the xlsx file:

with pd.ExcelWriter('s3://bucket_name/project/Database.xlsx', mode = 'w', engine="xlsxwriter") as writer:
    df.to_excel(writer, "Sheet 1")  
    writer.save()

Note: For xls, I just change the engine to openpyxl and change the file path to the xls one

I get this from the above code when running for xlsx/xls:

FileCreateError: [Errno 2] No such file or directory: 's3://bucket_name/project/Database.xlsx'

Even though it is in the exact same location as the other one. I am not sure what the problem is but I have not found any solution. I have tried adding 'r' to make it a raw string, I have tried changing the slashes around, but nothing seems to work. Does anyone that has experience with xlsxwriter/openpyxl know what the problem could be?

Upvotes: 3

Views: 1848

Answers (1)

RobinFrcd
RobinFrcd

Reputation: 5396

pd.to_csv can handle s3 paths since v0.20.0, but pd.ExcelWriter can't.

You'll need to s3fs or boto like explained here: Store Excel file exported from Pandas in AWS.

Upvotes: 2

Related Questions