Maykid
Maykid

Reputation: 517

Having a problem with Case When function in SQL

So I'm having a problem with a Case When function that I'm trying to run. In theory to me at least it should be able to fail the first Case When and go to the second When conclusion but doesn't. For examples of the data I'm looking at is as follows:

HM_9991256_RDR_LL_BA_NONE_REDDEAD
HW_1005489_HWA_AA_DG_NONE_HELLOWORLD
GG_8504807_BBA_CL_OI_NONE_GOODGAME
100_BB_4186482_CGA_IG_NONE_CALIFORNIA
200_HB_48558012_DDA_EV_NONE_EVEONLINE
900_GI_55488941_CA_DV_NONE_BATTLEFIELD

The Case When Function I'm running currently is:

CASE WHEN split(LINE_ITEM, '_')[OFFSET(1)] NOT LIKE '%[0-9]%' THEN regexp_replace(split(LINE_ITEM, '_')[OFFSET(2)], r'[^0-9]', '')
     WHEN split(LINE_ITEM, '_')[OFFSET(0)] NOT LIKE '%[0-9]%' THEN regexp_replace(split(LINE_ITEM, '_')[OFFSET(1)], r'[^0-9]', '')
  END AS ID_VARIABLE

Initial thought for me when I see this is, if it's it doesn't have numbers in the OFFSET(1) position it will grab OFFSET(2), if there are no numbers in OFFSET(0) position it will grab OFFSET(1). What I'm running into though is it will hit the first WHEN Statement and never go to the second WHEN Statement. My End goal is this:

ID_VARIABLE
  9991256
  1005489
  8504807
  4186482
  48558012
  55488941

But what I'm currently getting is this:

ID_VARIABLE
  4186482
  48558012
  55488941

As you can see I'm missing the first section of the ID's. But, I think I'm really close but either I'm writing something incorrectly or not seeing something correctly. Any help would be great with this!

-Maykid

Upvotes: 0

Views: 116

Answers (1)

Mikhail Berlyant
Mikhail Berlyant

Reputation: 172944

Below is for BigQuery Standard SQL

#standardSQL
SELECT 
  IF(
    SAFE_CAST(i1 AS INT64) IS NULL, 
    i2, 
    IF(SAFE_CAST(i0 AS INT64) IS NULL, i1, NULL)
  ) ID_VARIABLE
FROM `project.dataset.table`, 
  UNNEST([STRUCT<i0 STRING, i1 STRING, i2 STRING>(
    SPLIT(LINE_ITEM, '_')[OFFSET(0)], 
    SPLIT(LINE_ITEM, '_')[OFFSET(1)], 
    SPLIT(LINE_ITEM, '_')[OFFSET(2)]
  )])

If to apply to dummy data from your question as below

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'HM_9991256_RDR_LL_BA_NONE_REDDEAD' LINE_ITEM UNION ALL
  SELECT 'HW_1005489_HWA_AA_DG_NONE_HELLOWORLD' UNION ALL
  SELECT 'GG_8504807_BBA_CL_OI_NONE_GOODGAME' UNION ALL
  SELECT '100_BB_4186482_CGA_IG_NONE_CALIFORNIA' UNION ALL
  SELECT '200_HB_48558012_DDA_EV_NONE_EVEONLINE' UNION ALL
  SELECT '900_GI_55488941_CA_DV_NONE_BATTLEFIELD' 
)
SELECT 
  IF(
    SAFE_CAST(i1 AS INT64) IS NULL, 
    i2, 
    IF(SAFE_CAST(i0 AS INT64) IS NULL, i1, NULL)
  ) ID_VARIABLE
FROM `project.dataset.table`, 
  UNNEST([STRUCT<i0 STRING, i1 STRING, i2 STRING>(
    SPLIT(LINE_ITEM, '_')[OFFSET(0)], 
    SPLIT(LINE_ITEM, '_')[OFFSET(1)], 
    SPLIT(LINE_ITEM, '_')[OFFSET(2)]
  )])

result will be as expected:

Row ID_VARIABLE  
1   9991256  
2   1005489  
3   8504807  
4   4186482  
5   48558012     
6   55488941       

If for some reason you are bound to CASE WHEN statement - below should work for you

#standardSQL   
SELECT 
  CASE WHEN REGEXP_CONTAINS(i1, r'[^0-9]') THEN REGEXP_REPLACE(i2, r'[^0-9]', '')
       WHEN REGEXP_CONTAINS(i0, r'[^0-9]') THEN REGEXP_REPLACE(i1, r'[^0-9]', '')
    END AS ID_VARIABLE      
FROM `project.dataset.table`, 
  UNNEST([STRUCT<i0 STRING, i1 STRING, i2 STRING>(
    SPLIT(LINE_ITEM, '_')[OFFSET(0)], 
    SPLIT(LINE_ITEM, '_')[OFFSET(1)], 
    SPLIT(LINE_ITEM, '_')[OFFSET(2)]
  )])

Upvotes: 3

Related Questions