i_am_cris
i_am_cris

Reputation: 627

Scan columns for a value and use the result to select other columns

I have one bigquery table with 30 columns:

strcol,strcol1,strcol2,..  startstr, startstr1, startstr2,..   endstr, endstr1, endstr2,..
1111, 2343, 1012......     "car", "boat", "scooter".....       "plane", "bike", "boat"
9999, 1012, 3333......     "scooter", "boat", "scooter".....       "boat", "bike", "boat"

I need to scan all columns "strcol" to find a number. If I find the number I need to use the correspondent "startstr" and "endstr".

Example: Number 1012 is present in column "strcol2" then I need to use "startstr2" and "endstr2". My select statement only needs two columns. Result would be something like:

start, end
scooter, boat
boat, bike

I was thinking of creating an array of strcol...strcol9 and try to find the number and return the index, then use this index to find the correct startstr and endstr. But I don't know how to do this. Maybe there are much better alternatives? Any ideas?

The number 1012 will always be fixed and will never change.

Cheers, Cris

Upvotes: 2

Views: 160

Answers (3)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173013

Below is for BigQuery Standard SQL and might look less "advanced/verbose", so might be easy for you to handle/maintain

#standardSQL
select 
  regexp_extract(line, r'"startstr' || pattern) start,
  regexp_extract(line, r'"endstr' || pattern) `end`,
from `project.dataset.table` t, 
  unnest([to_json_string(t)]) line,
  unnest(generate_array(0, array_length(split(line))/3 - 1)) index,
  unnest([if(index > 0, cast(index as string), '') || '":"?([^,"]+)"?']) pattern
where regexp_extract(line, r'"strcol' || pattern) = cast(1012 as string)

When applied to sample data from your question - output is

enter image description here

Upvotes: 2

Mikhail Berlyant
Mikhail Berlyant

Reputation: 173013

Below is for BigQuery Standard SQL

#standardSQL
select start, `end`
from (
  select 
    max(if(key='strcol', value, null)) as strcol,
    cast(max(if(key='strcol', value, null)) as int64) as value,
    max(if(key='startstr', value, null)) as start,
    max(if(key='endstr', value, null)) as `end`
  from (
    select format('%t', t) id, 
      regexp_extract(col, r'^[^\d]+') key,
      regexp_extract(col, r'[\d]*$') offset,
      value
    from `project.dataset.table` t,
    unnest(split(translate(to_json_string(t), '{}"', ''))) as kv,
    unnest([struct(split(kv, ':')[offset(0)] as col, split(kv, ':')[offset(1)] as value)])
  )
  group by id, offset
)
where value = 1012   

if to apply to sample data from your question - output is

enter image description here

Upvotes: 1

Sergey Geron
Sergey Geron

Reputation: 10172

SELECT (
  CASE
    WHEN strcol = 1012 THEN STRUCT(startstr, endstr)
    WHEN strcol1 = 1012 THEN STRUCT(startstr1, endstr1)
    WHEN strcol2 = 1012 THEN STRUCT(startstr2, endstr2)
  END
).*
FROM dataset.table
WHERE 1012 IN (strcol, strcol1, strcol2)

Upvotes: 1

Related Questions