Reputation: 627
I have one bigquery table with 30 columns:
strcol,strcol1,strcol2,.. startstr, startstr1, startstr2,.. endstr, endstr1, endstr2,..
1111, 2343, 1012...... "car", "boat", "scooter"..... "plane", "bike", "boat"
9999, 1012, 3333...... "scooter", "boat", "scooter"..... "boat", "bike", "boat"
I need to scan all columns "strcol" to find a number. If I find the number I need to use the correspondent "startstr" and "endstr".
Example: Number 1012 is present in column "strcol2" then I need to use "startstr2" and "endstr2". My select statement only needs two columns. Result would be something like:
start, end
scooter, boat
boat, bike
I was thinking of creating an array of strcol...strcol9 and try to find the number and return the index, then use this index to find the correct startstr and endstr. But I don't know how to do this. Maybe there are much better alternatives? Any ideas?
The number 1012 will always be fixed and will never change.
Cheers, Cris
Upvotes: 2
Views: 160
Reputation: 173013
Below is for BigQuery Standard SQL and might look less "advanced/verbose", so might be easy for you to handle/maintain
#standardSQL
select
regexp_extract(line, r'"startstr' || pattern) start,
regexp_extract(line, r'"endstr' || pattern) `end`,
from `project.dataset.table` t,
unnest([to_json_string(t)]) line,
unnest(generate_array(0, array_length(split(line))/3 - 1)) index,
unnest([if(index > 0, cast(index as string), '') || '":"?([^,"]+)"?']) pattern
where regexp_extract(line, r'"strcol' || pattern) = cast(1012 as string)
When applied to sample data from your question - output is
Upvotes: 2
Reputation: 173013
Below is for BigQuery Standard SQL
#standardSQL
select start, `end`
from (
select
max(if(key='strcol', value, null)) as strcol,
cast(max(if(key='strcol', value, null)) as int64) as value,
max(if(key='startstr', value, null)) as start,
max(if(key='endstr', value, null)) as `end`
from (
select format('%t', t) id,
regexp_extract(col, r'^[^\d]+') key,
regexp_extract(col, r'[\d]*$') offset,
value
from `project.dataset.table` t,
unnest(split(translate(to_json_string(t), '{}"', ''))) as kv,
unnest([struct(split(kv, ':')[offset(0)] as col, split(kv, ':')[offset(1)] as value)])
)
group by id, offset
)
where value = 1012
if to apply to sample data from your question - output is
Upvotes: 1
Reputation: 10172
SELECT (
CASE
WHEN strcol = 1012 THEN STRUCT(startstr, endstr)
WHEN strcol1 = 1012 THEN STRUCT(startstr1, endstr1)
WHEN strcol2 = 1012 THEN STRUCT(startstr2, endstr2)
END
).*
FROM dataset.table
WHERE 1012 IN (strcol, strcol1, strcol2)
Upvotes: 1