Is it possible to write a Partitioned DataFrame into S3 bucket?

Question

I have to write a Spark DataFrame into S3 bucket and it should create a separate parquet file for each partition.

Here is my code:

dynamicDataFrame = DynamicFrame.fromDF(
                       testDataFrame, glueContext , 
                       "dynamicDataFrame")

glueContext.write_dynamic_frame.from_options(
                  frame = dynamicDataFrame,
                  connection_type = "s3",  
                  connection_options = {
                            "path": "s3://BUCKET_NAME/DIR_NAME",
                             "partitionKeys": ["COL_NAME"]
                  },
                  format = "parquet"
 )

When I specify "partitionKeys": ["COL_NAME"] option then Glue Job gets executed without any error but it does not create any file in S3.

And when I remove this "partitionKeys" option then it creates 200 parquet files in S3(default No Of Partition is 200). But I want to create partitions on the basis of a particular column.

So, is it possible to create partition wise parquet files in S3 while writing a DF in S3?

Note: I am using AWS resources i.e. AWS Glue.

Is it possible to write a Partitioned DataFrame into S3 bucket?

Answers (1)

Related Questions