Reputation: 454
I need to perform an append load to S3 bucket.
Now I need to write this dynamic data frame to S3 bucket which has all the previous day partitions present. In-fact I just need to write only one partition to the S3 bucket.Currently I am using the below piece of code to write data to S3 bucket.
// Write it out in Parquet for ERROR severity
glueContext.getSinkWithFormat(
connectionType = "s3",
options = JsonOptions(Map("path" -> "s3://some s3 bucket location",
"partitionKeys" -> Seq("partitonyear","partitonmonth","partitonday"))),
format = "parquet").writeDynamicFrame(DynamicFrame(dynamicDataframeToWrite.toDF().coalesce(maxExecutors), glueContext))
I am not sure if the above piece of code will perform an append load or not.Is there a way through AWS glue libraries to achieve the same?
Upvotes: 2
Views: 3831
Reputation: 4750
Your script will append new data files to appropriate partition. So if you are processing only today's data then it will create a new data partition under the path
. For example, if today is 2018-11-28 it will create new data object in s3://some_s3_bucket_location/partitonyear=2018/partitonmonth=11/partitonday=28/
folder.
If you try to write data into existing partition then Glue will append new files and will not remove existing objects. However this may lead to duplicates if run a job multiple times to process the same data.
Upvotes: 2