Adheeban
Adheeban

Reputation: 11

How do I write all the filenames written at the end of each window to a metadata file?

My use case is to write all the parquet filenames to a separate metadata file after writing it to GCS at the end of each window.

I have tried a set of different approaches, but with each approach I end up generating metadata files for a single window which would have partial data in them (parquet filenames written in a specific window spread across multiple metadata files).

Below is my desired output:

Metadata Filename: gs://my-bucket/path/to/my/metadata-file/metadata-20240117T12:40-20240117T12:45.txt


Metadata File Content:
gs://my-bucket/path/to/my/parquet-file/parquet-20240117T12:40-20240117T12:45-0.parquet
gs://my-bucket/path/to/my/parquet-file/parquet-20240117T12:40-20240117T12:45-1.parquet
gs://my-bucket/path/to/my/parquet-file/parquet-20240117T12:40-20240117T12:45-2.parquet
gs://my-bucket/path/to/my/parquet-file/parquet-20240117T12:40-20240117T12:45-3.parquet
gs://my-bucket/path/to/my/parquet-file/parquet-20240117T12:40-20240117T12:45-4.parquet
gs://my-bucket/path/to/my/parquet-file/parquet-20240117T12:40-20240117T12:45-5.parquet

The approaches I tried would put the same six filenames across 3-4 different metadata files.

What am I doing wrong here?

Here is my code that does the parquet writing: https://gist.github.com/iamadhee/c1a3c9ce7c4de89f543e32e5a006d0e5

Upvotes: 1

Views: 31

Answers (0)

Related Questions