JohnsonLee
JohnsonLee

Reputation: 75

Big query SQL - Count elements from a string

I have an example string data as below.

+----------------------------+
|          Items             |
|----------------------------|
| A >> B >> C                |
| A >> A >> B                |
| B >> B >> C >> C >> A      |
| B >> B >> B >> C >> A >> D |
+----------------------------+

Take the "Items" in the first row as an example, this "Items" contains three elements, namely A, B and C. I want to calculate the number of each element in order. Besides, I will only focus on 3 elements so I will ignore element 'D' in the last "Items".

Here's what I want:

+----------------------------+---------+----------+---------+----------+---------+----------+
|          Items             |  Item1  | I1_Count |  Item2  | I2_Count |  Item3  | I3_Count |
|----------------------------|---------|----------|---------|----------|---------|----------|
| A >> B >> C                |    A    |     1    |    B    |     1    |    C    |     1    | 
| A >> A >> B                |    A    |     2    |    B    |     1    |   NULL  |    NULL  |
| A >> B >> B >> C >> C      |    A    |     1    |    B    |     1    |    C    |     2    |
| B >> B >> C >> C >> A      |    B    |     2    |    C    |     2    |    A    |     1    |
| B >> B >> B >> C >> A >> D |    B    |     3    |    C    |     1    |    A    |     1    |
+----------------------------+---------+----------+---------+----------+---------+----------+

How can I implement it in SQL(I use Google Bigquery)?

Thanks a million!

Upvotes: 1

Views: 1337

Answers (2)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below is for BigQuery Standard SQL

#standardSQL
select items,
  arr[offset(0)].item as Item1,
  arr[offset(0)].cnt as I1_Count,
  arr[safe_offset(1)].item as Item2,
  arr[safe_offset(1)].cnt as I2_Count,
  arr[safe_offset(2)].item as Item3,
  arr[safe_offset(2)].cnt as I3_Count
from (
  select items, 
    array(
      select as struct item, count(*) as cnt
      from unnest(split(Items, ' >> ')) item with offset
      group by item
      order by min(offset)
    ) arr
  from `project.dataset.table`
)    

if to apply above to sample data from your question - output is

enter image description here

Upvotes: 2

Gordon Linoff
Gordon Linoff

Reputation: 1269493

You can unnest the string and then apply some aggregation logic:

with t as (
      select 'A >> B >> C' as items union all
      select 'A >> A >> B' as items union all
      select 'B >> B >> C >> C >> A' as items union all
      select 'B >> B >> B >> C >> A >> D' as items 
     )
select t.*,
       (select as struct max(case when n = 1 then item end) as item_1,
               max(case when n = 1 then cnt end) as item_1_cnt,
               max(case when n = 2 then item end) as item_2,
               max(case when n = 2 then cnt end) as item_2_cnt,
               max(case when n = 3 then item end) as item_3,
               max(case when n = 3 then cnt end) as item_3_cnt
        from (select item, dense_rank() over (order by min(n)) as n, count(*) as cnt
              from unnest(split(t.items, ' >> ')) item with offset n
              group by item
             ) x
       ).*
from t;

The innermost subquery transforms the string into rows, with a counter. This is then aggregated to order the items by the first time they appear -- along with the count of the item.

Finally, these are aggregated into the different columns that you want.

Upvotes: 2

Related Questions