Tom
Tom

Reputation: 6332

Can we call the timestamp specified in SourceFunction#collectWithTimestamp ingestion time

SourceFunction provides a method as: void collectWithTimestamp(T element, long timestamp);

From the definition of ingestion time, it looks that the timestamp provided by the source exactly looks like event ingestion time, not sure I have understaood correctly.

But from the javadoc of this method, it say:

On {@link TimeCharacteristic#IngestionTime}, the timestamp is overwritten with the system's current time, to realize proper ingestion time semantics

I didn't quite understand what the javadoc means

Upvotes: 0

Views: 159

Answers (1)

David Anderson
David Anderson

Reputation: 43499

If the TimeCharacteristic is IngestionTime, then any timestamp you provide in collectWithTimestamp will be ignored, and overwritten.

For example, if your source is Kafka and your events have log-append-time timestamps provided by the Kafka broker, you might be using those timestamps in collectWithTimestamp. But if you then specify ingestion time as the time characteristic, then those event time timestamps will be overwritten.

So the answer to "Can we call the timestamp specified in SourceFunction#collectWithTimestamp ingestion time" is no. The source is free to use whatever logic it likes to produce this timestamp, and it might well be a proper event time timestamp.

A major difference between event time timestamps and ingestion time timestamps is that event time timestamps are reproducible -- you can run the same job twice and get exactly the same results. That's not true with ingestion time.

Upvotes: 2

Related Questions