Reputation: 1
I have a table called event_source which has lot of data which is a mix of data from multiple sources produced by multiple teams and i want to create tables specific to each team.
This table gets data from kafka and stores the whole data json as string
CREATE MATERIALIZED VIEW events.event_source_mv TO events.event_source ( `message` String ) AS SELECT message AS message FROM events.kafka_engine;
CREATE TABLE events.event_source ( `ingest_time` DateTime DEFAULT now() CODEC(LZ4HC(9)), `message` String CODEC(LZ4HC(9)) ) ENGINE = MergeTree PARTITION BY toYYYYMMDD(ingest_time) ORDER BY ingest_time TTL toDateTime(ingest_time) + toIntervalMonth(6) SETTINGS index_granularity = 8192, storage_policy = 'hot_to_cold';
I am trting to create MATERIALIZED VIEW like following which updates new incoming data
CREATE MATERIALIZED VIEW event_source_business1_mv TO event_source_business1
AS SELECT
ingest_time as ingest_time,
JSONExtractString(message,'source') as source,
JSONExtractString(message,'eventName') as event_name,
JSONExtractString(JSONExtractString(message,'entityContext'),'type') as ec_type,
JSONExtractString(JSONExtractString(message,'additionalInfo'),'ip_address') as ai_ip_address
FROM event_source;
What is the best way to get the old data in the new table i.e. previous data present in source given the data in source table is close to 6TB uncompressed spanning over 3 months, would be glad to provide extra data if needed
tried using insert(something along the lines of insert into event_source_business1 from event_source where date>3months) but its failing inbetween after sometime, is there a better way
Upvotes: 0
Views: 335
Reputation: 29
INSERT ... SELECT is the way to go for moving data.
OFFSET and LIMIT could be leveraged to slice the big chunk into smaller pieces.
https://clickhouse.com/docs/en/operations/settings/settings#offset
I would probably run some loops to do the backfill.
Upvotes: 0