Reputation: 678
I have transaction data table which is in user level, where user buy different products day to day. I want to aggregate the total transaction count for per user by pivoting the columns with same prefix. meaning, per user, total time of transaction for pivoted columns. it should be conditioned on date as well.
reproducible data and attempt
here is reproducible data schema:
CREATE OR REPLACE TABLE transaction_tble (
user_id char(10),
transaction_date, date,
catg_prod1_amt int(10),
catg_prod2_amt int(10),
catg_prod3_amt int(10),
sku_item1_qty int(10),
sku_item2_qty int(10),
sku_item3_qty int(10)
)
here is the value of the table:
here is the sample data:
user_id,transaction_date,sku_sale_prod1,sku_sale_prod2,sku_sale_prod3,catg_purch_prod3,catg_purch_prod4,catg_purch_prod5
A,1/14/2022,10,0,0,0,11,0
B,1/10/2022,0,18,0,5,11,7
A,1/19/2022,6,18,2,5,0,0
A,1/19/2022,10,18,1,5,11,7
B,1/19/2022,10,18,1,5,11,7
C,1/20/2022,10,18,1,5,11,7
C,1/20/2022,10,18,1,5,11,7
B,4/19/2022,10,18,1,5,11,7
A,5/23/2022,10,18,1,5,11,7
A,7/23/2022,10,18,1,5,11,7
my attempt
I think using pivot operation may be right approach but it doesn't work
CREATE OR REPLACE TABLE trans_view (
select
user_id,
( case when transaction_date between '2022-01-01' - INTERVAL 6 MONTH then count(distinct transaction_date) end) as 2022_h1_trans_cnt,
( case when transaction_date between '2022-06-01' - INTERVAL 6 MONTH then count(distinct transaction_date) end) as 2022_h2_trans_cnt,
-- using pivot here
( case
when transaction_date between '2022-01-01' - INTERVAL 6 MONTH
and
PIVOT (
count(distinct transaction_date)
for sku in (
'sku_sale_prod1','sku_sale_prod2','sku_sale_prod3'
)
)
then count(distinct transaction_date)
end
) as 2022_h1_trans_cnt_sku,
--
( case
when transaction_date between '2022-01-01' - INTERVAL 6 MONTH
and
PIVOT (
count(distinct transaction_date)
for catg in (
'catg_purch_prod3','catg_purch_prod4','catg_purch_prod5'
)
)
then count(distinct transaction_date)
end
) as 2022_h1_trans_cnt_catg
from transaction_table
group by user_id
)
I looked into pivot function in big query, not sure what went wrong with my logic. Can anyone suggest possible workaround to do this?
desired output
here is my desired output table schema:
CREATE OR REPLACE VIEW transc_view as(
select
user_id,
2022_h1_trans_cnt,
2022_h2_trans_cnt,
2022_h1_trans_cnt_catg,
2022_h2_trans_cnt_catg,
2022_h1_trans_cnt_sku,
2022_h2_trans_cnt_sku
from transaction_tble
group by user_id
)
here is desired value of the output table:
user_id,2022_1st_6mth_trns_cnt,2022_2nd_6mth_trns_cnt,2022_1st_6mth_trns_cnt_sku,2022_2nd_6mth_trns_cnt_sku,2022_1st_6mth_trns_cnt_catg,2022_2nd_6mth_trns_cnt_catg
A,4,1,3,,3,1
B,3,0,3,0,3,0
C,1,1,1,0,1,0
How can I correctly pivot the table and get distinct transaction date count for this? any thoughts or quick workaround on this?
Upvotes: 1
Views: 581
Reputation: 12274
Except 2022_h1_trans_cnt
and 2022_h2_trans_cnt
in your expected outout, you can consider below.
WITH stacked AS (
SELECT *,
'y' || EXTRACT(YEAR FROM tx_date) || '_' ||
CASE
WHEN EXTRACT(QUARTER FROM tx_date) IN (1, 2) THEN 'h1' -- 1st, 2nd Quarter
WHEN EXTRACT(QUARTER FROM tx_date) IN (3, 4) THEN 'h2' -- 3rd, 4th Quarter
END AS period_name,
CASE
WHEN STARTS_WITH(metric, 'sku_sale_prod') THEN 'trns_cnt_sku'
WHEN STARTS_WITH(metric, 'catg_purch_prod') THEN 'trns_cnt_catg'
END AS metric_name,
FROM (
SELECT *, PARSE_DATE('%m/%d/%Y', transaction_date) tx_date
FROM transaction_table
UNPIVOT (value FOR metric IN (
catg_purch_prod3, catg_purch_prod4, catg_purch_prod5,
sku_sale_prod1, sku_sale_prod2, sku_sale_prod3
)
)
)
),
aggregation AS (
SELECT user_id, period_name || '_' || metric_name AS tx_period_metric,
COUNT(DISTINCT tx_date) cnt
FROM stacked
GROUP BY 1, 2
)
SELECT user_id,
y2022_h1_trns_cnt_sku AS y2022_h1_trns_cnt,
y2022_h2_trns_cnt_sku AS y2022_h2_trans_cnt,
* EXCEPT(user_id)
FROM aggregation
PIVOT (SUM(cnt) FOR tx_period_metric IN (
'y2022_h1_trns_cnt_sku', 'y2022_h2_trns_cnt_sku',
'y2022_h1_trns_cnt_catg', 'y2022_h2_trns_cnt_catg'
)
)
;
Query results
2022_h1_trans_cnt
and 2022_h2_trans_cnt
are same value as that of y2022_h1_trns_cnt_sku
, y2022_h2_trns_cnt_sku
respectively. Correct me if I'm wrong.Upvotes: 2