Apache beam dataframe write csv to GCS without shard name template

Question

I have a Dataflow pipeline using Apache Beam dataframe, and I'd like to write the csv to a GCS bucket. This is my code:

with beam.Pipeline(options=pipeline_options) as p:
    df = p | read_csv(known_args.input)
    df[column] = df.groupby(primary_key)[column].apply(lambda x: x.ffill().bfill()))
    df.to_csv(known_args.output, index=False, encoding='utf-8')

However, while I pass a gcs path to known_args.output, the written csv on gcs is added with shard, like this gs://path/to/file-00000-of-00001. For my project, I need the file name to be without the shard. I've read the documentation but there seems to be no options to remove the shard. I tried converting the df back to pcollection and use WriteToText but it doesn't work either, and also not a desirable solution.

Apache beam dataframe write csv to GCS without shard name template

Answers (1)

Related Questions