Reputation: 75
I have an example string data as below.
+----------------------------+
| Items |
|----------------------------|
| A >> B >> C |
| A >> A >> B |
| B >> B >> C >> C >> A |
| B >> B >> B >> C >> A >> D |
+----------------------------+
Take the "Items" in the first row as an example, this "Items" contains three elements, namely A, B and C. I want to calculate the number of each element in order. Besides, I will only focus on 3 elements so I will ignore element 'D' in the last "Items".
Here's what I want:
+----------------------------+---------+----------+---------+----------+---------+----------+
| Items | Item1 | I1_Count | Item2 | I2_Count | Item3 | I3_Count |
|----------------------------|---------|----------|---------|----------|---------|----------|
| A >> B >> C | A | 1 | B | 1 | C | 1 |
| A >> A >> B | A | 2 | B | 1 | NULL | NULL |
| A >> B >> B >> C >> C | A | 1 | B | 1 | C | 2 |
| B >> B >> C >> C >> A | B | 2 | C | 2 | A | 1 |
| B >> B >> B >> C >> A >> D | B | 3 | C | 1 | A | 1 |
+----------------------------+---------+----------+---------+----------+---------+----------+
How can I implement it in SQL(I use Google Bigquery)?
Thanks a million!
Upvotes: 1
Views: 1337
Reputation: 172944
Below is for BigQuery Standard SQL
#standardSQL
select items,
arr[offset(0)].item as Item1,
arr[offset(0)].cnt as I1_Count,
arr[safe_offset(1)].item as Item2,
arr[safe_offset(1)].cnt as I2_Count,
arr[safe_offset(2)].item as Item3,
arr[safe_offset(2)].cnt as I3_Count
from (
select items,
array(
select as struct item, count(*) as cnt
from unnest(split(Items, ' >> ')) item with offset
group by item
order by min(offset)
) arr
from `project.dataset.table`
)
if to apply above to sample data from your question - output is
Upvotes: 2
Reputation: 1269493
You can unnest the string and then apply some aggregation logic:
with t as (
select 'A >> B >> C' as items union all
select 'A >> A >> B' as items union all
select 'B >> B >> C >> C >> A' as items union all
select 'B >> B >> B >> C >> A >> D' as items
)
select t.*,
(select as struct max(case when n = 1 then item end) as item_1,
max(case when n = 1 then cnt end) as item_1_cnt,
max(case when n = 2 then item end) as item_2,
max(case when n = 2 then cnt end) as item_2_cnt,
max(case when n = 3 then item end) as item_3,
max(case when n = 3 then cnt end) as item_3_cnt
from (select item, dense_rank() over (order by min(n)) as n, count(*) as cnt
from unnest(split(t.items, ' >> ')) item with offset n
group by item
) x
).*
from t;
The innermost subquery transforms the string into rows, with a counter. This is then aggregated to order the items by the first time they appear -- along with the count of the item.
Finally, these are aggregated into the different columns that you want.
Upvotes: 2