Reputation: 3505
I am having difficulty understanding the basic kafka stream example:
// Construct a `KStream` from the input topic "streams-plaintext-input", where message values
// represent lines of text (for the sake of this example, we ignore whatever may be stored
// in the message keys). The default key and value serdes will be used.
final KStream<String, String> textLines = builder.stream(inputTopic);
final Pattern pattern = Pattern.compile("\\W+", Pattern.UNICODE_CHARACTER_CLASS);
final KTable<String, Long> wordCounts = textLines
// Split each text line, by whitespace, into words. The text lines are the record
// values, i.e. we can ignore whatever data is in the record keys and thus invoke
// `flatMapValues()` instead of the more generic `flatMap()`.
.flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase())))
// Group the split data by word so that we can subsequently count the occurrences per word.
// This step re-keys (re-partitions) the input data, with the new record key being the words.
// Note: No need to specify explicit serdes because the resulting key and value types
// (String and String) match the application's default serdes.
.groupBy((keyIgnored, word) -> word)
// Count the occurrences of each word (record key).
.count();
// Write the `KTable<String, Long>` to the output topic.
wordCounts.toStream().to(outputTopic, Produced.with(Serdes.String(), Serdes.Long()));
Can someone please explain the .flatMapValues part?
From what I can see, flatMapValues turn the KStream<String, String>
to
KStream<String, List<String>>
so how does the subsequent .groupBy chained can somehow have String, String
input params?
Upvotes: 0
Views: 2196
Reputation: 191874
.flatMap
is an operator that, when returned a collection, will return its individual elements "flattened" out into individual items to the next operator
Upvotes: 1