Reputation: 2876
Here is a sample of the data table I have :
Account Date Items
Account1 2018-01-01 249
Account1 2018-01-02 298
Account1 2018-01-03 297
Account1 2018-01-04 263
Account2 2018-02-05 251
Account2 2018-02-12 249
Account2 2018-02-13 298
Account2 2018-03-14 297
Account2 2018-05-23 263
Account2 2018-05-24 251
etc...
I've data for more than 100 accounts. Number of items are available everyday for each account over the last 10 years.
Using the following query I have a list of 7 days moving average value classified in the bucket I've created with the CASE function :
SELECT Account,
CASE
WHEN max(mov_avg_7d) <1000 THEN "less than 1k"
WHEN max(mov_avg_7d) >=1000 AND max(mov_avg_7d) <2500 THEN "1k"
WHEN max(mov_avg_7d) >=2500 AND max(mov_avg_7d) <5000 THEN "2.5K"
WHEN max(mov_avg_7d) >=5000 AND max(mov_avg_7d) <10000 THEN "5k"
WHEN max(mov_avg_7d) >=10000 AND max(mov_avg_7d) <20000 THEN "10k"
WHEN max(mov_avg_7d) >=20000 AND max(mov_avg_7d) <30000 THEN "20k"
WHEN max(mov_avg_7d) >=30000 AND max(mov_avg_7d) <40000 THEN "30k"
WHEN max(mov_avg_7d) >=40000 AND max(mov_avg_7d) <50000 THEN "40k"
WHEN max(mov_avg_7d) >=50000 AND max(mov_avg_7d) <60000 THEN "50k"
WHEN max(mov_avg_7d) >=60000 AND max(mov_avg_7d) <70000 THEN "60k"
WHEN max(mov_avg_7d) >=70000 AND max(mov_avg_7d) <90000 THEN "70k"
WHEN max(mov_avg_7d) >=90000 AND max(mov_avg_7d) <100000 THEN "90k"
WHEN max(mov_avg_7d) >=100000 THEN "100k"
ELSE "error"
END AS status
FROM (
SELECT Account,date,Items,
AVG(Items) OVER (PARTITION BY Account ORDER BY UNIX_DATE(date) RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS mov_avg_7d
FROM `my_big_table_here`
)
group by Account
I would like to have the max(mov_avg_7d) for each bucket and for each account. In other words I would like to build the same query but having something like group by Account,status
I'm not really confortable explaining all this. I hope my question is clear.
EDIT :
My output would be something like this :
Account Status Items_max
Account1 less than 1k 249
Account1 1k 1500
Account1 2.5K 2400
Account2 less than 1k 133
Account2 30k 25000
Account2 50k 49000
etc...
Upvotes: 0
Views: 636
Reputation: 229
I'm assuming you want to get the following result:
Account Status Max
Account1 less than 1k 999
Account1 1k 2499
...
Account2 less than 1k 888
Account2 1k 2488
...
Also, I'm still new to BigQuery and SQL, so do correct me if the query doesn't work or if there are other ways of solving this problem.
My code is:
CREATE TEMP FUNCTION STATUS(mov_avg_7d FLOAT64) AS (
CASE
WHEN mov_avg_7d <1000 THEN "less than 1k"
WHEN mov_avg_7d >=1000 AND mov_avg_7d <2500 THEN "1k"
WHEN mov_avg_7d >=2500 AND mov_avg_7d <5000 THEN "2.5K"
WHEN mov_avg_7d >=5000 AND mov_avg_7d <10000 THEN "5k"
WHEN mov_avg_7d >=10000 AND mov_avg_7d <20000 THEN "10k"
WHEN mov_avg_7d >=20000 AND mov_avg_7d <30000 THEN "20k"
WHEN mov_avg_7d >=30000 AND mov_avg_7d <40000 THEN "30k"
WHEN mov_avg_7d >=40000 AND mov_avg_7d <50000 THEN "40k"
WHEN mov_avg_7d >=50000 AND mov_avg_7d <60000 THEN "50k"
WHEN mov_avg_7d >=60000 AND mov_avg_7d <70000 THEN "60k"
WHEN mov_avg_7d >=70000 AND mov_avg_7d <90000 THEN "70k"
WHEN mov_avg_7d >=90000 AND mov_avg_7d <100000 THEN "90k"
WHEN mov_avg_7d >=100000 THEN "100k"
ELSE "error"
END);
SELECT Account, STATUS(mov_avg_7d) as status, max(mov_avg_7d) as max
FROM (
SELECT Account,date,Items,
AVG(Items) OVER (PARTITION BY Account ORDER BY UNIX_DATE(date) RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS mov_avg_7d
FROM `my_big_table_here`
)
group by Account, status
Hope this helps!
Explanation:
Inner subquery:
SELECT Account,date,Items,
AVG(Items) OVER (PARTITION BY Account ORDER BY UNIX_DATE(date) RANGE BETWEEN 6 PRECEDING AND CURRENT ROW) AS mov_avg_7d
FROM `my_big_table_here`
gives you the following result (inner result table):
Account Date Items mov_avg_7d
Account1 01012018 xx xxx
Account1 02012018 xx xxx
Account1 03012018 xx xxx
...
Account2 01012018 xx xxx
Account2 02012018 xx xxx
Account2 03012018 xx xxx
...
As for the outer subquery:
SELECT Account, STATUS(mov_avg_7d) as status, max(mov_avg_7d) as max
FROM (
<inner subquery>
)
group by Account, status
STATUS() is a user defined function that takes the value of mov_avg_7d for each row of the inner result table and put it into the respective category.
Let's separate the SELECT statement to make things easier. Example 1:
SELECT Account, max(mov_avg_7d) as max
...
GROUP BY Account
gives the maximum 7 day rolling average for each account, regardless of the status. ie:
Account Max
Account1 99199
Account2 82849
...
Then, example 2:
SELECT STATUS(mov_avg_7d) as status, max(mov_avg_7d) as max
...
GROUP BY status
gives the maximum 7 day rolling average for each status, regardless of the account. ie:
Status Max
less than 1k 899
1k 2488
2.5k 4500
...
So if you want to get the maximum 7 day rolling average for each account, each status, you simply SELECT and GROUP BY both account and status.
I hope this explanation helps, I'm not sure how else to explain to make things clearer for you as I'm still quite new to SQL too. Feel free to comment if you have any doubts and we can work on them together! :)
Upvotes: 2