Reputation: 517
So I'm having a problem with a Case When function that I'm trying to run. In theory to me at least it should be able to fail the first Case When and go to the second When conclusion but doesn't. For examples of the data I'm looking at is as follows:
HM_9991256_RDR_LL_BA_NONE_REDDEAD
HW_1005489_HWA_AA_DG_NONE_HELLOWORLD
GG_8504807_BBA_CL_OI_NONE_GOODGAME
100_BB_4186482_CGA_IG_NONE_CALIFORNIA
200_HB_48558012_DDA_EV_NONE_EVEONLINE
900_GI_55488941_CA_DV_NONE_BATTLEFIELD
The Case When Function I'm running currently is:
CASE WHEN split(LINE_ITEM, '_')[OFFSET(1)] NOT LIKE '%[0-9]%' THEN regexp_replace(split(LINE_ITEM, '_')[OFFSET(2)], r'[^0-9]', '')
WHEN split(LINE_ITEM, '_')[OFFSET(0)] NOT LIKE '%[0-9]%' THEN regexp_replace(split(LINE_ITEM, '_')[OFFSET(1)], r'[^0-9]', '')
END AS ID_VARIABLE
Initial thought for me when I see this is, if it's it doesn't have numbers in the OFFSET(1)
position it will grab OFFSET(2)
, if there are no numbers in OFFSET(0)
position it will grab OFFSET(1)
. What I'm running into though is it will hit the first WHEN
Statement and never go to the second WHEN
Statement. My End goal is this:
ID_VARIABLE
9991256
1005489
8504807
4186482
48558012
55488941
But what I'm currently getting is this:
ID_VARIABLE
4186482
48558012
55488941
As you can see I'm missing the first section of the ID's. But, I think I'm really close but either I'm writing something incorrectly or not seeing something correctly. Any help would be great with this!
-Maykid
Upvotes: 0
Views: 116
Reputation: 172944
Below is for BigQuery Standard SQL
#standardSQL
SELECT
IF(
SAFE_CAST(i1 AS INT64) IS NULL,
i2,
IF(SAFE_CAST(i0 AS INT64) IS NULL, i1, NULL)
) ID_VARIABLE
FROM `project.dataset.table`,
UNNEST([STRUCT<i0 STRING, i1 STRING, i2 STRING>(
SPLIT(LINE_ITEM, '_')[OFFSET(0)],
SPLIT(LINE_ITEM, '_')[OFFSET(1)],
SPLIT(LINE_ITEM, '_')[OFFSET(2)]
)])
If to apply to dummy data from your question as below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'HM_9991256_RDR_LL_BA_NONE_REDDEAD' LINE_ITEM UNION ALL
SELECT 'HW_1005489_HWA_AA_DG_NONE_HELLOWORLD' UNION ALL
SELECT 'GG_8504807_BBA_CL_OI_NONE_GOODGAME' UNION ALL
SELECT '100_BB_4186482_CGA_IG_NONE_CALIFORNIA' UNION ALL
SELECT '200_HB_48558012_DDA_EV_NONE_EVEONLINE' UNION ALL
SELECT '900_GI_55488941_CA_DV_NONE_BATTLEFIELD'
)
SELECT
IF(
SAFE_CAST(i1 AS INT64) IS NULL,
i2,
IF(SAFE_CAST(i0 AS INT64) IS NULL, i1, NULL)
) ID_VARIABLE
FROM `project.dataset.table`,
UNNEST([STRUCT<i0 STRING, i1 STRING, i2 STRING>(
SPLIT(LINE_ITEM, '_')[OFFSET(0)],
SPLIT(LINE_ITEM, '_')[OFFSET(1)],
SPLIT(LINE_ITEM, '_')[OFFSET(2)]
)])
result will be as expected:
Row ID_VARIABLE
1 9991256
2 1005489
3 8504807
4 4186482
5 48558012
6 55488941
If for some reason you are bound to CASE WHEN
statement - below should work for you
#standardSQL
SELECT
CASE WHEN REGEXP_CONTAINS(i1, r'[^0-9]') THEN REGEXP_REPLACE(i2, r'[^0-9]', '')
WHEN REGEXP_CONTAINS(i0, r'[^0-9]') THEN REGEXP_REPLACE(i1, r'[^0-9]', '')
END AS ID_VARIABLE
FROM `project.dataset.table`,
UNNEST([STRUCT<i0 STRING, i1 STRING, i2 STRING>(
SPLIT(LINE_ITEM, '_')[OFFSET(0)],
SPLIT(LINE_ITEM, '_')[OFFSET(1)],
SPLIT(LINE_ITEM, '_')[OFFSET(2)]
)])
Upvotes: 3