Reputation: 2024
My Clickhouse table has a primary key column (pk
), an insert timestamp column (insert_ts
), and a bunch of data columns. I'd like to get the latest value for each data column. My query could look like this:
SELECT pk, argMax(data1, insert_ts), argMax(data2, insert_ts), ... GROUP BY pk
That's very verbose, and I'd prefer to use a wildcard with EXCEPT/APPLY like this:
SELECT * EXCEPT(insert_ts) APPLY(argMax) GROUP BY pk
But I have no way to specify the second argument to argMax
. Any ideas?
I'm interested both in answers to this specific question, and also to answers to the XY-problem that propose a different way to structure my table.
Upvotes: 3
Views: 4056
Reputation: 2024
Thanks to the Clickhouse team for implementing a solution really quickly!
SELECT * EXCEPT(insert_ts) APPLY(x->argMax(x,insert_ts)) GROUP BY pk
For the XY problem, Clickhouse offers the ReplacingMergeTree engine for explicitly keeping only the latest rows:
Upvotes: 3