Mayank Soni
Mayank Soni

Reputation: 344

Bigquery - How to select top 50 elements based on a aggregated column, grouped over 3 columns

Example table

FN LN City id E
may son lalitpur 1 a
may son lalitpur 2 c
may son lalitpur 3 a
may son jhansi 4 a
may son jhansi 11 a
may son varansi 4 a
may son kanpur 11 a
may ton jhansi 4 a
day son jhansi 11 a

I need to find top x cities based on distinct id for each unique pair of FN,LN columns.

Output if selecting top 2 values

FN LN City unique Ids
may son lalitpur 4
may son jhansi. 2
may ton jhansi. 1
day son jhansi. 1

With following sql I have got the data but I can't filter based on uniquq ids

           SELECT
                FN,
                LN,
                City,
                count(distinct(userId)) as unique_ids
            FROM
                 tableName
            WHERE
                <some condition>
            group by
                1,
                2,
                3

Upvotes: 0

Views: 130

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1269513

You can use aggregation with window functions like this:

select x.* except (seqnum)
from (select fn, ln, city, count(distinct id) as unique_ids,
             row_number() over (partition by fn, ln order by count(distinct id) desc) as seqnum
      from tableName t
      group by fn, ln, city
     ) x
where seqnum <= 50;

A simpler way to write this uses the very recently introduced qualify clause:

select fn, ln, city, count(distinct id) as unique_ids
from tableName t
group by fn, ln, city
qualify row_number() over (partition by fn, ln order by count(distinct id) desc) <= 50;

Or:

select fn, ln, city, count(distinct id) as unique_ids
from tableName t
group by fn, ln, city
qualify row_number() over (partition by fn, ln order by unique_ids desc) <= 50;

Upvotes: 2

Related Questions