Reputation: 59
i have a table in bigquery like this (260000 rows):
vendor date item_price discount_price
x 2021-07-08 23:41:10 451,5 0
y 2021-06-14 10:22:10 41,7 0
z 2020-01-03 13:41:12 74 4
s 2020-04-12 01:14:58 88 12
....
exactly what I want is to group this data by month and find the sum of the sales of only the top 20 vendors in that month. Expected output:
month vendor_name(top20) sum_of_vendor's_sales sum_of_vendor's_discount item_count(sold)
2020-01 x1 10857 250 150
2020-01 x2 9685 410 50
2020-01 x3 3574 140 45
....
2021 01 x20 700 15 20
2020-02 y1 7421 280 120
2020-02 y2 6500 250 40
2020-02 y3 4500 200 70
.....
2020-02 y20 900 70 30
i tried this (source here). But The desired output could not be obtained.
select month,
(select sum(sum) from t.top_20_vendors) as sum_of_only_top20_vendor_sales
from (
select
format_datetime('%Y%m', date) month,
approx_top_sum(vendor, item_price, 20) top_20_vendors,count(item_price) as count_of_items,sum(discount_price)
from my_table
group by month
) t
Upvotes: 0
Views: 43
Reputation: 172994
Consider below approach
select
format_datetime('%Y%m', date) month,
vendor as vendor_name_top20,
sum(item_price) as sum_of_vendor_sales,
sum(discount_price) as sum_of_vendor_discount,
count(*) as item_count_sold
from your_table
group by vendor, month
qualify row_number() over(partition by month order by sum_of_vendor_sales desc) <= 20
Upvotes: 1