Reputation: 469
I have to write a Spark DataFrame into S3 bucket and it should create a separate parquet file for each partition.
Here is my code:
dynamicDataFrame = DynamicFrame.fromDF(
testDataFrame, glueContext ,
"dynamicDataFrame")
glueContext.write_dynamic_frame.from_options(
frame = dynamicDataFrame,
connection_type = "s3",
connection_options = {
"path": "s3://BUCKET_NAME/DIR_NAME",
"partitionKeys": ["COL_NAME"]
},
format = "parquet"
)
When I specify "partitionKeys": ["COL_NAME"] option then Glue Job gets executed without any error but it does not create any file in S3.
And when I remove this "partitionKeys" option then it creates 200 parquet files in S3(default No Of Partition is 200). But I want to create partitions on the basis of a particular column.
So, is it possible to create partition wise parquet files in S3 while writing a DF in S3?
Note: I am using AWS resources i.e. AWS Glue.
Upvotes: 0
Views: 1645
Reputation: 2144
R you sure partition column has data?
Do you find anything in glue logs
Upvotes: 1