Reputation:
How to write parquet partitioned by column into s3? I'm trying:
def write_df_into_s3(df, bucket_name, filepath, format="parquet"):
buffer = None
hook = S3Hook()
if format == "parquet":
buffer = BytesIO()
df.to_parquet(buffer, index=False, partition_cols=['date'])
else:
raise Exception("Format not implemented!")
hook.load_bytes(buffer.getvalue(), filepath, bucket_name)
return f"s3://{bucket_name}/{filepath}"
But I got an error 'NoneType' object has no attribute '_isfilestore'
.
Upvotes: 2
Views: 6006
Reputation: 4788
For python 3.6+, AWS has a library called aws-data-wrangler that helps with the integration between Pandas/S3/Parquet
to install do;
pip install awswrangler
if you want to write your pandas dataframe as a partitioned parquet file to S3, do;
import awswrangler as wr
wr.s3.to_parquet(
dataframe=df,
path="s3://my-bucket/key/"
dataset=True,
partition_cols=["date"]
)
Upvotes: 2