Reputation: 47
I will try to phrase this the best I can. I want to write/append to an xls/xlsx file from Sagemaker that is in an S3 bucket. There is an empty excel sheet for each file type (csv,xls,xlsx) in the S3 bucket that I upload prior to. I am able to write/append a df to the empty csv file in the S3 bucket no problem. But it does not work for xls/xlsx. Here is the code I am using for the csv:
df.to_csv('s3://bucket_name/temp/Database.csv', index=False, mode = 'w', header = False)
Here is the code I am using for the xlsx file:
with pd.ExcelWriter('s3://bucket_name/project/Database.xlsx', mode = 'w', engine="xlsxwriter") as writer:
df.to_excel(writer, "Sheet 1")
writer.save()
Note: For xls, I just change the engine to openpyxl and change the file path to the xls one
I get this from the above code when running for xlsx/xls:
FileCreateError: [Errno 2] No such file or directory: 's3://bucket_name/project/Database.xlsx'
Even though it is in the exact same location as the other one. I am not sure what the problem is but I have not found any solution. I have tried adding 'r' to make it a raw string, I have tried changing the slashes around, but nothing seems to work. Does anyone that has experience with xlsxwriter/openpyxl know what the problem could be?
Upvotes: 3
Views: 1848
Reputation: 5396
pd.to_csv
can handle s3 paths since v0.20.0, but pd.ExcelWriter
can't.
You'll need to s3fs or boto like explained here: Store Excel file exported from Pandas in AWS.
Upvotes: 2