Reputation: 21
1) Is Key required on the Stream where you want to perform aggregate function. I have read several blogs and also recommendation from Confluent that KEY is required for aggregation function to work
CREATE STREAM Employee (EmpId BIGINT, EmpName VARCHAR,
DeptId BIGINT, SAL BIGINT) WITH (KAFKA_TOPIC='EmpTopic',
VALUE_FORMAT='JSON');
While defining above Stream, I have not defined any KEY (ROWKEY is NULL). Underlying topic 'EmpTopic' also does not a KEY.
I am performing aggregation function on the Stream.
CREATE TABLE SALBYDEPT AS
SELECT DeptId,
SUM(SAL)
FROM Employee
GROUP BY DeptId;
Please confirm whether performing Aggregation function on the above Stream requires a KEY on 'Employee' stream ie NOT NULL ROWKEY on 'Employee' Stream
2) As per Confluent documentation, "Windowing lets you control how to group records that have the same key for stateful operations, like aggregations or joins, into time spans. KSQL tracks windows per record key". Please help me understand the meaning of the above statement. Is it required the Stream should have NOT NULL KEY?
3) Will JOIN on Stream-Table retain the KEY
CREATE TABLE users
(registertime BIGINT,
userid VARCHAR,
gender VARCHAR,
regionid VARCHAR)
WITH (KAFKA_TOPIC = 'users',
VALUE_FORMAT='JSON',
KEY = 'userid');
CREATE STREAM pageviews
(viewtime BIGINT,
userid VARCHAR,
pageid VARCHAR)
WITH (KAFKA_TOPIC='pageviews',
VALUE_FORMAT='DELIMITED',
KEY='pageid',
TIMESTAMP='viewtime');
CREATE STREAM pageviews_transformed as
SELECT viewtime,
userid,
pageid,
TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS') AS timestring
FROM pageviews
CREATE STREAM pageviews_enriched AS
SELECT pv.viewtime,
pv.userid AS userid,
pv.pageid,
pv.timestring,
u.gender,
u.regionid,
u.interests,
u.contactinfo
FROM pageviews_transformed pv
LEFT JOIN users u ON pv.userid = u.userid;
Will JOIN on Stream-Table retain the 'UserId' as ROWKEY in the new Stream 'pageviews_enriched'
4) I have seen several examples from Confluent on Github where Stream used in JOIN is not KEY'ed. But as per the documentation, Stream should have NOT NULL ROWKEY participating the JOIN. Please confirm to have NOT NULL ROWKEY in the Stream.
Stream-Stream join and Stream-Table join. In the below example I am performing JOIN on Stream with NULL ROWKEY and Table. Is this valid?
CREATE TABLE users
(registertime BIGINT,
userid VARCHAR,
gender VARCHAR,
regionid VARCHAR)
WITH (KAFKA_TOPIC = 'users',
VALUE_FORMAT='JSON',
KEY = 'userid');
CREATE STREAM pageviews
(viewtime BIGINT,
userid VARCHAR,
pageid VARCHAR)
WITH (KAFKA_TOPIC='pageviews',
VALUE_FORMAT='DELIMITED',
TIMESTAMP='viewtime');
CREATE STREAM pageviews_transformed as
SELECT viewtime,
userid,
pageid,
TIMESTAMPTOSTRING(viewtime, 'yyyy-MM-dd HH:mm:ss.SSS') AS timestring
FROM pageviews
CREATE STREAM pageviews_enriched AS
SELECT pv.viewtime,
pv.userid AS userid,
pv.pageid,
pv.timestring,
u.gender,
u.regionid,
u.interests,
u.contactinfo
FROM pageviews_transformed pv
LEFT JOIN users u ON pv.userid = u.userid;
Upvotes: 0
Views: 1710
Reputation: 32080
CREATE TABLE SALBYDEPT AS SELECT DeptId, SUM(SAL) FROM Employee GROUP BY DeptId;
- Please confirm whether performing Aggregation function on the above Stream requires a KEY on 'Employee' stream ie NOT NULL ROWKEY on 'Employee' Stream
You do not need a key on this stream. The key of the created table will be DeptId
.
- As per Confluent documentation, "Windowing lets you control how to group records that have the same key for stateful operations, like aggregations or joins, into time spans. KSQL tracks windows per record key". Please help me understand the meaning of the above statement. Is it required the Stream should have NOT NULL KEY?
This means that when you create an aggregation you can do so over a time window, and that time window is part of the message key. For example, instead of aggregating all employee SAL
(sales?), you could choose to do so over a time window, perhaps every hour or day. In that case you would have the aggregate key (DeptId
), combined with the window key (e.g. for hourly 2019-06-23 06:00:00
, 2019-06-23 07:00:00
, 2019-06-23 08:00:00
etc)
- Will JOIN on Stream-Table retain the KEY
It will retain the stream's key, unless you include a PARTITION BY
in the DDL.
- I have seen several examples from Confluent on Github where Stream used in JOIN is not KEY'ed. But as per the documentation, Stream should have NOT NULL ROWKEY participating the JOIN. Please confirm to have NOT NULL ROWKEY in the Stream.
Do you have a link to the specific documentation you're referencing? Whilst a table does need to be keyed, a stream does not (KSQL may handle this under the covers; I'm not sure).
Upvotes: 2