How to set content-disposition at Dataflow Apache Beam when writing csv to google cloud storage

Question

I have a dataflow pipeline in java that reads from bigquery and then saves to a google cloud storage a .csv.gz file (in different shards), everything is working, the fileNamePolicy and the process in general works.

The issue is that when someone downloads the .csv.gz files of this csv shards, it is appending to the name the path to this file (searching this is normal default behaviour of google cloud storage), so for example if you save the file CSV_SHARD_1 to the bucket test_bucket/dev/20241206/CSV_SHARD_1.csv.gz and then you download this file it will have this name:

dev-20241206-CSV_SHARD_1.csv.gz

I need it to keep the name CSV_SHARD_1.csv.gz

I was reading how to fix this and it is done with the content-disposition (which is at objects metadata), but I can not find this on the documentation or at any examples of apache beam, can someone help with this?

I am using Apache Beam TEXTIO to write the csv, here is an example and documentation: https://cloud.google.com/dataflow/docs/guides/write-to-cloud-storage#write-files

How to set content-disposition at Dataflow Apache Beam when writing csv to google cloud storage

Answers (0)

Related Questions