Reputation: 11
Update
added
env.getConfig().setAutoWatermarkInterval(1000L);
did not fix the problem.
I guess the problem lies in another part of my code. So firstly a little more background.
The program consumes a JSON stream of mixed message types from a single kafka queue. the program converts initially into a stream of type ObjectNode
. this stream is then split down using .split()
in to around 10 separate streams. these streams are mapped to streams of POJOs.
these POJO streams are then assigned time stamps, before being added to a window, (1 window per stream of POJO type), keyed by and then summed and averaged within a custom fuction, before being sent back to another kafka queue.
Expanded code Example
public class flinkkafka {
public static void main(String[] args) throws Exception {
//create object mapper to allow object to JSON transform
final ObjectMapper mapper = new ObjectMapper();
final String OUTPUT_QUEUE = "test";
//setup streaming environment
StreamExecutionEnvironment env =
StreamExecutionEnvironment
.getExecutionEnvironment();
//set streaming environment variables from command line
ParameterTool parameterTool = ParameterTool.fromArgs(args);
//set time characteristic to EventTime
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
//set watermark polling interval
env.getConfig().setAutoWatermarkInterval(1000L);
//Enable checkpoints to allow for graceful recovery
env.enableCheckpointing(1000);
//set parallelism
env.setParallelism(1);
//create an initial data stream of mixed messages
DataStream<ObjectNode> messageStream = env.addSource
(new FlinkKafkaConsumer09<>(
parameterTool.getRequired("topic"),
new JSONDeserializationSchema(),
parameterTool.getProperties()))
.assignTimestampsAndWatermarks(new
BoundedOutOfOrdernessTimestampExtractor<ObjectNode>
(Time.seconds(10)){
private static final long serialVersionUID = 1L;
@Override
public long extractTimestamp(ObjectNode value) {
DateFormat format = new SimpleDateFormat("yyyy-
MM-dd HH:mm:ss", Locale.ENGLISH);
long tmp = 0L;
try {
tmp =
format.parse(value.get("EventReceivedTime")
.asText()).getTime();
} catch (ParseException e) {
e.printStackTrace();
}
System.out.println("Assigning timestamp " +
tmp);
return tmp;
}
});
//split stream by message type
SplitStream<ObjectNode> split = messageStream.split(new
OutputSelector<ObjectNode>(){
private static final long serialVersionUID = 1L;
@Override
public Iterable<String> select(ObjectNode value){
List<String> output = new ArrayList<String>();
switch (value.get("name").asText()){
case "one":
switch (value.get("info").asText()){
case "two":
output.add("info");
System.out.println("Sending message to two
stream");
break;
case "three":
output.add("three");
System.out.println("Sending message to three stream");
break;
case "four":
output.add("four");
System.out.println("Sending message to four stream");
break;
case "five":
output.add("five");
System.out.println("Sending message to five stream");
break;
case "six":
output.add("six");
System.out.println("Sending message to six stream");
break;
default:
break;
}
break;
case "seven":
output.add("seven");
System.out.println("Sending message to seven stream");
break;
case "eight":
output.add("eight");
System.out.println("Sending message to eight stream");
break;
case "nine":
output.add("nine");
System.out.println("Sending message to nine stream");
break;
case "ten":
switch (value.get("info").asText()){
case "eleven":
output.add("eleven");
System.out.println("Sending message to eleven stream");
break;
case "twelve":
output.add("twelve");
System.out.println("Sending message to twelve stream");
break;
default:
break;
}
break;
default:
output.add("failed");
break;
}
return output;
}
});
//assign splits to new data streams
DataStream<ObjectNode> two = split.select("two");
//assigning more splits to streams
//convert ObjectNodes to POJO
DataStream<Two> twoStream = two.map(new MapFunction<ObjectNode, Two>(){
private static final long serialVersionUID = 1L;
@Override
public Twomap(ObjectNode value) throws Exception {
Two stream = new Two();
stream.Time = value.get("Time").asText();
stream.value = value.get("value").asLong();
return front;
}
});
DataStream<String> keyedTwo = twoStream
.keyBy("name")
.timeWindow(Time.minutes(5))
.apply(new twoSum())
.map(new MapFunction<Two, String>(){
private static final long serialVersionUID = 1L;
@Override
public String map(Two value) throws Exception {
return mapper.writeValueAsString(value);
}
});
keyedTwo.addSink(new FlinkKafkaProducer09<String>
(parameterTool.getRequired("bootstrap.servers"),
OUTPUT_QUEUE, new SimpleStringSchema()));
env.execute();
I am attempting to use Flink to aggregate a Kafka queue and push the data stream back to Kafka. The aggregation will use a 5 minute Event time window, the program compiles and runs but the collected data never leaves the window to be passed to the aggregation function and so never delivers messages to Kafka. However if i comment out the eventTime characteristic the the program runs and produces results. I have no idea where I am going wrong.
EventTime Code
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
ParameterTool parameterTool = ParameterTool.fromArgs(args);
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.enableCheckpointing(1000);
DataStream<FrontEnd> frontEndStream = frontEnd.map(new
MapFunction<ObjectNode, FrontEnd>(){
private static final long serialVersionUID = 1L;
@Override
public FrontEnd map(ObjectNode value) throws Exception {
FrontEnd front = new FrontEnd();
front.eventTime = value.get("EventReceivedTime").asText();
return front;
}
}).assignTimestampsAndWatermarks(new
BoundedOutOfOrdernessTimestampExtractor<FrontEnd>(Time.seconds(10)){
private static final long serialVersionUID = 1L;
@Override
public long extractTimestamp(FrontEnd value) {
DateFormat format = new SimpleDateFormat("yyyy-MM-
ddHH:mm:ss",Locale.ENGLISH);
long tmp = 0L;
try {
tmp = format.parse(value.eventTime).getTime();
} catch (ParseException e) {
e.printStackTrace();
}
return tmp;
}
});
DataStream<String> keyedFrontEnd = frontEndStream
.keyBy("name")
.timeWindow(Time.minutes(5))
.apply(new FrontEndSum())
.map(new MapFunction<FrontEnd, String>(){
private static final long serialVersionUID = 1L;
@Override
public String map(FrontEnd value) throws Exception {
return mapper.writeValueAsString(value);
}
});
.map(new MapFunction<FrontEnd, String>(){
private static final long serialVersionUID = 1L;
@Override
public String map(FrontEnd value) throws Exception {
return mapper.writeValueAsString(value);
}
});
keyedFrontEnd.addSink(new FlinkKafkaProducer09<String>
(parameterTool.getRequired("bootstrap.servers"), OUTPUT_QUEUE, new
SimpleStringSchema()));
env.execute();
}
}
I have tried with the time stamp extractor attached to the incoming stream and with one attached to each of the POJO streams. Again this code runs with event time and produces the expected result of a stream of JSON strings with the expected aggregations. However once event time is enabled the windows never produce a result
Upvotes: 1
Views: 2357
Reputation: 669
@JayaAnanthram, yes, it works if setAutoWatermarkInterval
after setStreamTimeCharacteristic
. Reason is setStreamTimeCharacteristic
will override the value set by setAutoWatermarkInterval
according to code :
public void setStreamTimeCharacteristic(TimeCharacteristic characteristic) {
this.timeCharacteristic = Preconditions.checkNotNull(characteristic);
if (characteristic == TimeCharacteristic.ProcessingTime) {
getConfig().setAutoWatermarkInterval(0);
} else {
getConfig().setAutoWatermarkInterval(200);
}
}
Upvotes: 0
Reputation: 36
My first inclination is always to assume a TimeZone issue.
What is the timezone of the "EventReceivedTime"
field in your kafka payload?
SimpleDateFormat will parse in the LOCAL JVM time zone:
DateFormat format = new SimpleDateFormat("yyyy-MM-ddHH:mm:ss",Locale.ENGLISH);
you can add
format.setTimeZone(TimeZone.getTimeZone("GMT"));
to parse your string as GMT, for example, if that is what the text represents. You should make sure that the timezone/offset of all your dates, watermarks, etc. match, and are all compared in UTC / epoch time (which is what you get once you extract the Long out of it).
Upvotes: 0
Reputation: 18987
The BoundedOutOfOrdernessTimestampExtractor
implements the AssignerWithPeriodicWatermarks
interface, which means that Flink periodically queries the current watermark.
You have to configure the polling interval via the ExecutionConfig
:
env.getConfig.setAutoWatermarkInterval(1000L); // poll watermark every second
Upvotes: 1