Reputation: 2816
I have two raw streams and I am joining those streams and then I want to count what is the total number of events that have been joined and how much events have not. I am doing this by using map on joinedEventDataStream
as shown below
joinedEventDataStream.map(new RichMapFunction<JoinedEvent, Object>() {
@Override
public Object map(JoinedEvent joinedEvent) throws Exception {
number_of_joined_events += 1;
return null;
}
});
Question # 1: Is this the appropriate way to count the number of events in the stream?
Question # 2: I have noticed a wired behavior, which some of you might not believe. The issue is that when I run my Flink program in IntelliJ IDE, it shows me correct value for number_of_joined_events
but 0
in the case when I submit this program as jar
. So I am getting the initial value of number_of_joined_events
when I run the program as a jar
file instead of the actual count. Why is this happening only in case of jar
file submission and not in IDE?
Upvotes: 1
Views: 2585
Reputation: 18997
Your approach is not working. The behavior you noticed when executing the program via a JAR file is expected.
I don't know how number_of_joined_events
is defined, but I assume its a static variable in your program. When you run the program in your IDE, it runs in a single JVM. Hence, all operators have access to the static variable. When you submit a JAR file to a remote process, the program is executed in a different JVM (possibly multiple JVMs) and the static variable in your client process is never updated.
You can use Flink's metrics or a ReduceFunction
that sums 1
s to count the number of processed records.
Upvotes: 2