Reputation: 87
I am trying to window data from google cloud pubsub stream at a 10s frequency, however I get this error:
java.lang.IllegalArgumentException: Cannot output with timestamp 2019-07-20T12:13:04.875Z. Output timestamps must be no earlier than the timestamp of the current input (2019-07-20T12:13:05.591Z) minus the allowed skew (0 milliseconds). See the DoFn#getAllowedTimestampSkew() Javadoc for details on changing the allowed skew. org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.checkTimestamp(SimpleDoFnRunner.java:587) org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnProcessContext.outputWithTimestamp(SimpleDoFnRunner.java:566) org.apache.beam.sdk.transforms.DoFnOutputReceivers$WindowedContextOutputReceiver.outputWithTimestamp(DoFnOutputReceivers.java:80) org.apache.beam.sdk.transforms.WithTimestamps$AddTimestampsDoFn.processElement(WithTimestamps.java:136)
Here is the code that causes the error:
eventStream
.apply("Add Event Timestamps",
WithTimestamps.of((Event event) -> new Instant(event.getTime())))
.apply("Window Events",
Window.<Event>into(FixedWindows.of(Duration.parseDuration("10s"))));
What is the cause of this and what is a suitable solution?
Upvotes: 2
Views: 2768
Reputation: 737
From the documentation:
If the input {@link PCollection} elements have timestamps, the output timestamp for each element must not be before the input element's timestamp minus the value of {@link getAllowedTimestampSkew()}. If an output timestamp is before this time, the transform will throw an {@link IllegalArgumentException} when executed. Use {@link withAllowedTimestampSkew(Duration)} to update the allowed skew.
CAUTION: Use of {@link #withAllowedTimestampSkew(Duration)} permits elements to be emitted behind the watermark. These elements are considered late, and if behind the {@link Window#withAllowedLateness(Duration) allowed lateness} of a downstream {@link PCollection} may be silently dropped.
So, to fix the issue you may play with withAllowedTimestampSkew
.
I used a different API: withTimestampAttribute
.
You can set an attribute in your JSON/AVRO that will contain the timestamp field.
This API is available when publish:
.apply(PubsubIO.writeAvros(Someclass.class)
.withIdAttribute("id")
.withTimestampAttribute("myTime").to(topic));
And when Subscribing:
.apply(PubsubIO.readAvros(Someclass.class) .fromSubscription(...)
.withIdAttribute("id").withTimestampAttribute("myTime"))
Upvotes: 1