craig
craig

Reputation: 26262

Regex if-else expression

I'm trying to extract the # of minutes from a text field using Oracle's REGEXP_SUBSTR() function.

Data:

Treatment of PC7, PT1 on left. 15 min.
15 minutes.
15 minutes
15 mins.
15 mins
15 min.
15 min
15min
15

In each case, I'm hoping to extract the '15' part of the string.

Attempts:

What is wrong with my conditional statement?

** EDIT **

WITH DATA AS (

  SELECT 'Treatment of PC7, PT1 on left. 15 min.' COMMENTS FROM DUAL

  UNION ALL

  SELECT '15 minutes.' COMMENTS FROM DUAL

  UNION ALL

  SELECT '15 minutes' COMMENTS FROM DUAL

  UNION ALL

  SELECT '15 mins.' COMMENTS FROM DUAL

  UNION ALL

  SELECT '15 mins' COMMENTS FROM DUAL

  UNION ALL

  SELECT '15 min.' COMMENTS FROM DUAL

  UNION ALL

  SELECT '15 min' COMMENTS FROM DUAL

  UNION ALL

  SELECT '15min' COMMENTS FROM DUAL

  UNION ALL

  SELECT '15' COMMENTS FROM DUAL

)

SELECT  COMMENTS,
        REGEXP_SUBSTR(COMMENTS, '(\d+)\s?(?:min.*)?$', 1, 1) A,
        REGEXP_SUBSTR(COMMENTS, '\d+?(?= ?min)|^\d+$', 1, 1) B,
        REGEXP_SUBSTR(COMMENTS, '\d+?(?: ?min)|^\d+$', 1, 1) C

FROM DATA

Results (there must be a better way to format columns than as 'code sample'):

COMMENTS A B C
Treatment of PC7, PT1 on left. 15 min.          
15 minutes.         
15 minutes          
15 mins.            
15 mins         
15 min.         
15 min          
15min           
15  15  15  15

Upvotes: 1

Views: 1721

Answers (3)

Aust
Aust

Reputation: 11602

This Regex will work for you.

^.*?(\d+)(( ?min.*$)|$)

Explanation

  • ^.*? - matches the beginning of the string, followed by any character 0 or more times
  • (\d+) - matches at least one digit and stores it in backreference position 1
  • ( ?min.*$) - matches a space (maybe), min, any character (maybe), then the end of the string.
  • (...|$) - if it can't find min, it will see if there is the end of the string instead.

Then instead of using REGEXP_SUBSTR(), use REGEXP_REPLACE() like this, replacing the entire string with what was stored in backreference position 1 (your number):

REGEXP_REPLACE(COMMENTS, '^.*?(\d+)(( ?min.*$)|$)', '\1') A

Upvotes: 3

craig
craig

Reputation: 26262

Alternate, working solution that employs multiple expressions:

TO_NUMBER(
    CASE
        -- number only
        WHEN LENGTH( REGEXP_SUBSTR(MEAS_COMMENT, '^\d+$', 1, 1) ) >0 THEN REGEXP_SUBSTR(MEAS_COMMENT, '^\d+$', 1, 1)
        -- XX min
        WHEN LENGTH( REGEXP_SUBSTR(MEAS_COMMENT, '\d+( ?min)', 1, 1) ) >0 THEN REGEXP_SUBSTR(REGEXP_SUBSTR(MEAS_COMMENT, '\d+( ?min)', 1, 1), '\d+', 1, 1)
    END
) MINUTES

** edit **

-- same results without the `CASE` statement:
TO_NUMBER( REGEXP_SUBSTR(REGEXP_SUBSTR(COMMENTS, '(\d+)( ?min|$)', 1, 1), '\d+', 1, 1) )

Upvotes: 0

Xophmeister
Xophmeister

Reputation: 9211

If you can guarantee that the timestamp will always appear as the last part of the string, this works:

(\d+)\s?(?:min.*)?$

I don't know if Oracle supports non-capturing groups, but that's easily worked around. My experience is that it's RegExp engine is limited to quite vanilla stuff.

Upvotes: 1

Related Questions