Alessandro Calmanovici
Alessandro Calmanovici

Reputation: 101

Apache beam fileio write compressed files

I would like to know if it's possible to write compressed files using the fileio module from Apache Beam, Python SDK. At the moment I am using the module to write files to a GCP bucket:

_ = (logs | 'Window' >> beam.WindowInto(window.FixedWindows(60*60))
    | 'Convert to JSON' >>  beam.ParDo(ConvertToJson())
    | 'Write logs to GCS file' >> fileio.WriteToFiles(path = gsc_output_path, shards=1, max_writers_per_bundle=0))

Compression would help in minimizing storage costs.

According to this doc and comment inside class _MoveTempFilesIntoFinalDestinationFn, developers still need to implement handling of compression.

Am I right about this or does someone know how to do it?

Thank you!

Upvotes: 0

Views: 377

Answers (1)

ningk
ningk

Reputation: 1383

developers still need to implement handling of compression.

This is correct.

Though there are open FRs:

At the moment, you can write a DoFn: read the final files -> compress -> write the compressed final files and delete original final files.

Upvotes: 1

Related Questions