Lionel
Lionel

Reputation: 596

write window wise result on different file

Is it possible to write window wise result on different file. Means, I need to append time to file prefix or make time wise directory so that I can access particular window result without an additional filter. (as like apache-spark)

Upvotes: 0

Views: 81

Answers (1)

Davor Bonaci
Davor Bonaci

Reputation: 1729

The answer depends on whether you are using windowing in batch or streaming mode.

In the streaming mode, Cloud Dataflow Service doesn't support writing to files at this time. In this case, you'd want to use the BigQuery sink instead, where we do support per-window sharding.

Code example (see Javadoc for more details):

PCollection<TableRow> quotes = ...;
quotes.apply(Window.<TableRow>into(CalendarWindows.days(1)))
  .apply(BigQueryIO.Write
     .named("Write")
     .withSchema(schema)
     .to(new SerializableFunction<BoundedWindow, String>() {
       public String apply(BoundedWindow window) {
         // The cast below is safe because CalendarWindows.days(1) produces IntervalWindows.
         String dayString = DateTimeFormat.forPattern("yyyy_MM_dd")
              .withZone(DateTimeZone.UTC)
              .print(((IntervalWindow) window).start());
         return "my-project:output.output_table_" + dayString;
       }
     }));

In the batch mode, TextIO.Write doesn't have a convenience method ready for such purpose, but you can implement something similar yourself without much trouble. For example, a way to accomplish this is via Partition transform, whose outputs are piped to separate TextIO.Write sinks.

Upvotes: 2

Related Questions