Double partition property defining in KSQL

Question

There is an example in the article https://docs.confluent.io/current/ksql/docs/developer-guide/transform-a-stream-with-ksql.html:

CREATE STREAM pageviews_transformed
  WITH (TIMESTAMP='viewtime',
        PARTITIONS=5,
        VALUE_FORMAT='JSON') AS
  SELECT viewtime,
         userid,
         pageid,
         TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS') AS timestring
  FROM pageviews
  PARTITION BY userid
  EMIT CHANGES;

You can see that there is double partitions property defining. In WITH clause we define partitions count for brand new stream (topic). In GROUP BY clause - for incoming messages so as to be able to define to what partition send a message.

We created a stream with 5 partitions. Let's imagine that we have messages with 6 unique userid. In this case how will messages be distributed over that 5 partitions?

OneCricketeer · Accepted Answer

PARTITIONS is the number of Kafka topic partitions

PARTITION BY defines which kafka message key is used during record production

Let's imagine that we have messages with 6 unique userid. In this case how will messages be distributed over that 5 partitions

Via Kafka's DefaultPartioner class

Double partition property defining in KSQL

Answers (1)

Related Questions