Reputation: 815
I have a dataflow pipeline in java that reads from bigquery and then saves to a google cloud storage a .csv.gz file (in different shards), everything is working, the fileNamePolicy
and the process in general works.
The issue is that when someone downloads the .csv.gz files of this csv shards, it is appending to the name the path to this file (searching this is normal default behaviour of google cloud storage), so for example if you save the file CSV_SHARD_1
to the bucket test_bucket/dev/20241206/CSV_SHARD_1.csv.gz
and then you download this file it will have this name:
dev-20241206-CSV_SHARD_1.csv.gz
I need it to keep the name CSV_SHARD_1.csv.gz
I was reading how to fix this and it is done with the content-disposition
(which is at objects metadata), but I can not find this on the documentation or at any examples of apache beam, can someone help with this?
I am using Apache Beam TEXTIO
to write the csv, here is an example and documentation: https://cloud.google.com/dataflow/docs/guides/write-to-cloud-storage#write-files
Upvotes: 0
Views: 43