Paul
Paul

Reputation: 13

Sliding window pyflink

I am new to PyFlink and I have a kafka stream which has phone_number, host_name and event_time all in string formats. How can I compute number of visits for each pair phone_number, host_name, during last 24 hours using DataStreams API of pyflink?

I tried viewing examples on official GitHub, but I don't understand weather I need to define watermark strategy or not

Upvotes: 0

Views: 54

Answers (1)

David Anderson
David Anderson

Reputation: 43697

Watermarks are necessary if both of these conditions hold:

  1. You are doing something temporal -- i.e., counting something in a 24-hour window, as you are doing.
  2. You are relying on timestamps in the events as the source of timing information (as opposed to relying on the time-of-day system clock at the time of event processing).

So in your case, you will need to define a watermark strategy if you are using event time rather than processing time.

Upvotes: 0

Related Questions