Reputation: 901
There is an example in the article https://docs.confluent.io/current/ksql/docs/developer-guide/transform-a-stream-with-ksql.html:
CREATE STREAM pageviews_transformed
WITH (TIMESTAMP='viewtime',
PARTITIONS=5,
VALUE_FORMAT='JSON') AS
SELECT viewtime,
userid,
pageid,
TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS') AS timestring
FROM pageviews
PARTITION BY userid
EMIT CHANGES;
You can see that there is double partitions property defining. In WITH clause we define partitions count for brand new stream (topic). In GROUP BY clause - for incoming messages so as to be able to define to what partition send a message.
We created a stream with 5 partitions. Let's imagine that we have messages with 6 unique userid. In this case how will messages be distributed over that 5 partitions?
Upvotes: 0
Views: 25
Reputation: 191711
PARTITIONS
is the number of Kafka topic partitions
PARTITION BY
defines which kafka message key is used during record production
Let's imagine that we have messages with 6 unique userid. In this case how will messages be distributed over that 5 partitions
Via Kafka's DefaultPartioner
class
Upvotes: 1