Big query SQL - Count elements from a string

Question

I have an example string data as below.

+----------------------------+
|          Items             |
|----------------------------|
| A >> B >> C                |
| A >> A >> B                |
| B >> B >> C >> C >> A      |
| B >> B >> B >> C >> A >> D |
+----------------------------+

Take the "Items" in the first row as an example, this "Items" contains three elements, namely A, B and C. I want to calculate the number of each element in order. Besides, I will only focus on 3 elements so I will ignore element 'D' in the last "Items".

Here's what I want:

+----------------------------+---------+----------+---------+----------+---------+----------+
|          Items             |  Item1  | I1_Count |  Item2  | I2_Count |  Item3  | I3_Count |
|----------------------------|---------|----------|---------|----------|---------|----------|
| A >> B >> C                |    A    |     1    |    B    |     1    |    C    |     1    | 
| A >> A >> B                |    A    |     2    |    B    |     1    |   NULL  |    NULL  |
| A >> B >> B >> C >> C      |    A    |     1    |    B    |     1    |    C    |     2    |
| B >> B >> C >> C >> A      |    B    |     2    |    C    |     2    |    A    |     1    |
| B >> B >> B >> C >> A >> D |    B    |     3    |    C    |     1    |    A    |     1    |
+----------------------------+---------+----------+---------+----------+---------+----------+

How can I implement it in SQL(I use Google Bigquery)?

Thanks a million!

Gordon Linoff · Accepted Answer

You can unnest the string and then apply some aggregation logic:

with t as (
      select 'A >> B >> C' as items union all
      select 'A >> A >> B' as items union all
      select 'B >> B >> C >> C >> A' as items union all
      select 'B >> B >> B >> C >> A >> D' as items 
     )
select t.*,
       (select as struct max(case when n = 1 then item end) as item_1,
               max(case when n = 1 then cnt end) as item_1_cnt,
               max(case when n = 2 then item end) as item_2,
               max(case when n = 2 then cnt end) as item_2_cnt,
               max(case when n = 3 then item end) as item_3,
               max(case when n = 3 then cnt end) as item_3_cnt
        from (select item, dense_rank() over (order by min(n)) as n, count(*) as cnt
              from unnest(split(t.items, ' >> ')) item with offset n
              group by item
             ) x
       ).*
from t;

The innermost subquery transforms the string into rows, with a counter. This is then aggregated to order the items by the first time they appear -- along with the count of the item.

Finally, these are aggregated into the different columns that you want.

Big query SQL - Count elements from a string

Answers (2)

Related Questions