Reputation: 596
Is it possible to write window wise result on different file. Means, I need to append time to file prefix or make time wise directory so that I can access particular window result without an additional filter. (as like apache-spark
)
Upvotes: 0
Views: 81
Reputation: 1729
The answer depends on whether you are using windowing in batch or streaming mode.
In the streaming mode, Cloud Dataflow Service doesn't support writing to files at this time. In this case, you'd want to use the BigQuery sink instead, where we do support per-window sharding.
Code example (see Javadoc for more details):
PCollection<TableRow> quotes = ...;
quotes.apply(Window.<TableRow>into(CalendarWindows.days(1)))
.apply(BigQueryIO.Write
.named("Write")
.withSchema(schema)
.to(new SerializableFunction<BoundedWindow, String>() {
public String apply(BoundedWindow window) {
// The cast below is safe because CalendarWindows.days(1) produces IntervalWindows.
String dayString = DateTimeFormat.forPattern("yyyy_MM_dd")
.withZone(DateTimeZone.UTC)
.print(((IntervalWindow) window).start());
return "my-project:output.output_table_" + dayString;
}
}));
In the batch mode, TextIO.Write
doesn't have a convenience method ready for such purpose, but you can implement something similar yourself without much trouble. For example, a way to accomplish this is via Partition
transform, whose outputs are piped to separate TextIO.Write
sinks.
Upvotes: 2