Big query SQL - Count elements from a string but in order

Question

I have an example string data as below.

+----------------------------+
|          Items             |
|----------------------------|
| A >> B >> C                |
| A >> A >> B                |
| B >> C >> B >> B >> A      |
| B >> B >> C >> C >> A      |
| B >> A >> B >> C >> A >> D |
+----------------------------+

Take the "Items" in the first row as an example, this "Items" contains three elements, namely A, B and C. I want to calculate the number of each element in order. Besides, I only focus on 3 elements so I ignore 'C >> A >> D' in the final "Items".

Here's what I want:

+----------------------------+---------+----------+---------+----------+---------+----------+
|          Items             |  Item1  | I1_Count |  Item2  | I2_Count |  Item3  | I3_Count |
|----------------------------|---------|----------|---------|----------|---------|----------|
| A >> B >> C                |    A    |     1    |    B    |     1    |    C    |     1    | 
| A >> A >> B                |    A    |     2    |    B    |     1    |   NULL  |    NULL  |
| B >> C >> B >> B >> A      |    B    |     1    |    C    |     1    |    B    |     2    |
| B >> B >> C >> C >> A      |    B    |     2    |    C    |     2    |    A    |     1    |
| B >> A >> B >> C >> A >> D |    B    |     1    |    A    |     1    |    B    |     1    |
+----------------------------+---------+----------+---------+----------+---------+----------+

How can I implement it in SQL(I use Google Bigquery)?

Thanks a million!

Mikhail Berlyant · Accepted Answer

Below is for BigQuery Standard SQL (and is just simple adjustment of the solution for your previous question)

#standardSQL
select items,
  arr[offset(0)].item as Item1,
  arr[offset(0)].cnt as I1_Count,
  arr[safe_offset(1)].item as Item2,
  arr[safe_offset(1)].cnt as I2_Count,
  arr[safe_offset(2)].item as Item3,
  arr[safe_offset(2)].cnt as I3_Count
from (
  select items, 
    array(select as struct item, count(1) cnt
    from unnest(
    array(select as struct item, countif(not flag) over(order by offset) grp
    from unnest(array(
        select as struct item, offset, item = lag(item) over(order by offset) flag
        from unnest(split(Items, ' >> ')) item with offset
      )))
    ) 
    group by item, grp
    order by grp
    ) arr  
  from `project.dataset.table`
)

if to apply to sample data from your question - output is

Big query SQL - Count elements from a string but in order

Answers (2)

Related Questions