Jonathan
Jonathan

Reputation: 2024

Apply argMax over multiple columns in Clickhouse

My Clickhouse table has a primary key column (pk), an insert timestamp column (insert_ts), and a bunch of data columns. I'd like to get the latest value for each data column. My query could look like this:

SELECT pk, argMax(data1, insert_ts), argMax(data2, insert_ts), ... GROUP BY pk

That's very verbose, and I'd prefer to use a wildcard with EXCEPT/APPLY like this:

SELECT * EXCEPT(insert_ts) APPLY(argMax) GROUP BY pk

But I have no way to specify the second argument to argMax. Any ideas?

I'm interested both in answers to this specific question, and also to answers to the XY-problem that propose a different way to structure my table.

Upvotes: 3

Views: 4056

Answers (1)

Jonathan
Jonathan

Reputation: 2024

Thanks to the Clickhouse team for implementing a solution really quickly!

SELECT * EXCEPT(insert_ts) APPLY(x->argMax(x,insert_ts)) GROUP BY pk

For the XY problem, Clickhouse offers the ReplacingMergeTree engine for explicitly keeping only the latest rows:

Upvotes: 3

Related Questions