Reputation: 1
I have a kafka topic which the messages with the form of comma separated JSON (key and values). I also have a clickhouse cluster running in docker containers. I can stream ingest the messages from kafka topic into clickhouse by clickhouse kafka table engine (create a destination table by merge tree engine --- a kafka table engine with settings deo to my kafka bootstrap servers --- topic and a materialized view liking these two.) and it works fine.
the problem is that I have to define all the fields manually in create table command. while there are too many fields in each message and there's the possibility of few fields to be added to these messages at any time. so I want the ingestion process to be dynamic about the fields and add necessary columns to the clickhouse table when ever it finds new fields in messages. also I want it to automatically dedicate the data type for each field it adds to the table similar to the data type of the values existing in the message.
I tried to use: :) set input_format_skip_unknown_fields=0 in the clickhouse client but it didn't result any changes.
I prefer to use clickhouse kafka table engine as it is much simples to use and doesn't need deployment, but if my aim is not something that can be provided by this option, I am open to use any other option which can bring me automatic parcing considering data types.
Upvotes: 0
Views: 697
Reputation: 86
My recommendation here is to write to a "raw" table the JSON events in a single string field (maybe extracting the timestamp and other stable fields) then derive specific materialized views based on the fields that need to be extracted using JSONExtract* functions.
Upvotes: 1