Sandeep
Sandeep

Reputation: 131

Google Cloud Dataflow TextIO write to .gz file?

How can we create a compressed file in GCS through Google dataflow jobs?

I am not able to specify compression type. If the feature is not already present, is there a cleaner way to output to a compressed file from Google BigQuery's query?

Upvotes: 1

Views: 707

Answers (1)

Matthias Baetens
Matthias Baetens

Reputation: 1553

You'll want to use TextIO to write to files (for an overview of all the built-in I/O transform, look here).

You can see an example in the code here:

PCollection<String> lines = ...;
lines.apply(TextIO.write().to("/path/to/file.txt")
  .withSuffix(".txt")
  .withWritableByteChannelFactory(FileBasedSink.CompressionType.GZIP));

Edit: you can also export a table from BigQuery to a gzipped file directly from the GUI:enter image description here

Upvotes: 6

Related Questions