AWS Glue write and compress with the files in output bucket

Question

I have an ETL job that runs daily, uses bookmarks and writes the increment to some output s3 bucket. The output bucket is partitioned by one key.

Now, I want to have just one file by each partition. I can achieve that on the first run of the job as following:

datasource = datasource.repartition(1)

glueContext.write_dynamic_frame.from_options(
connection_type = "s3",
frame = datasource, 
connection_options = {"path":output_path, "partitionKeys": ["a_key"]}, 
format = "glueparquet",format_options={"compression":"gzip"},
transformation_ctx = "write_dynamic_frame")

What I can't figure out is how to write and compress my increment with the files that are already in my output bucket/partition. One option would be to read the table from the previous day and merge it with the increment, but it seems like an overkill.

Any smarter ideas?

AWS Glue write and compress with the files in output bucket

Answers (1)

Related Questions