maraujop
maraujop

Reputation: 4594

Calculate aggregated maximum by two columns in bigquery

I have requests logs in a Bigquery dataset with the following simplified format:

timestamp |  text    |   machine_type
2021-06-03 12:49:22.198412 UTC  | <request>...</request> | small
2021-06-03 12:46:21.198412 UTC  | <request>...</request> | medium
2021-06-03 12:52:29.198412 UTC  | Connection intialization | medium
2021-06-03 12:12:12.198412 UTC  | Error: xxx | big

I need to get the second with the maximum number of requests per second for every type of machine there is and which second was the one with the highest load. Something like:

timestamp | maximum_number_of_requests | machine_type
2021-06-03 12:46:21.198412 UTC | 232 | medium
2021-02-03 11:29:12.193311 UTC  | 15 | small

I don't find a way to group by requests, count them and max them, while including the timestamp at the same time.

Upvotes: 0

Views: 118

Answers (2)

Gordon Linoff
Gordon Linoff

Reputation: 1269773

This is a good place to use qualify:

select timestamp_trunc(timestamp, second) as timestamp,
       count(*) as number_of_requests,
       machine_type
from mytable
group by 1, 3
qualify row_number() over (partition by machine_type order by count(*) desc) = 1;

Upvotes: 1

Sergey Geron
Sergey Geron

Reputation: 10172

Try this one:

select array_agg(t order by number_of_requests desc limit 1)[offset(0)].*
from (
  select
    timestamp_trunc(timestamp, second) as timestamp,
    count(*) as number_of_requests,
    machine_type
  from mytable
  group by 1, 3
) t
group by t.machine_type

Upvotes: 1

Related Questions