Reputation: 26262
I'm trying to extract the # of minutes from a text field using Oracle's REGEXP_SUBSTR()
function.
Data:
Treatment of PC7, PT1 on left. 15 min.
15 minutes.
15 minutes
15 mins.
15 mins
15 min.
15 min
15min
15
In each case, I'm hoping to extract the '15' part of the string.
Attempts:
\d+
gets all of the numeric values, including the '7' and '1', which is undesirable.(\d)+(?=\ ?min)
get the '15' from all rows except the last.(?((\d)+(?=\ ?min))((\d)+(?=\ ?min))|\d+)
, a conditional statement, doesnt' match anything.What is wrong with my conditional statement?
** EDIT **
WITH DATA AS (
SELECT 'Treatment of PC7, PT1 on left. 15 min.' COMMENTS FROM DUAL
UNION ALL
SELECT '15 minutes.' COMMENTS FROM DUAL
UNION ALL
SELECT '15 minutes' COMMENTS FROM DUAL
UNION ALL
SELECT '15 mins.' COMMENTS FROM DUAL
UNION ALL
SELECT '15 mins' COMMENTS FROM DUAL
UNION ALL
SELECT '15 min.' COMMENTS FROM DUAL
UNION ALL
SELECT '15 min' COMMENTS FROM DUAL
UNION ALL
SELECT '15min' COMMENTS FROM DUAL
UNION ALL
SELECT '15' COMMENTS FROM DUAL
)
SELECT COMMENTS,
REGEXP_SUBSTR(COMMENTS, '(\d+)\s?(?:min.*)?$', 1, 1) A,
REGEXP_SUBSTR(COMMENTS, '\d+?(?= ?min)|^\d+$', 1, 1) B,
REGEXP_SUBSTR(COMMENTS, '\d+?(?: ?min)|^\d+$', 1, 1) C
FROM DATA
Results (there must be a better way to format columns than as 'code sample'):
COMMENTS A B C
Treatment of PC7, PT1 on left. 15 min.
15 minutes.
15 minutes
15 mins.
15 mins
15 min.
15 min
15min
15 15 15 15
Upvotes: 1
Views: 1721
Reputation: 11602
This Regex will work for you.
^.*?(\d+)(( ?min.*$)|$)
Explanation
^.*?
- matches the beginning of the string, followed by any character 0 or more times(\d+)
- matches at least one digit and stores it in backreference position 1( ?min.*$)
- matches a space (maybe), min
, any character (maybe), then the end of the string.(...|$)
- if it can't find min
, it will see if there is the end of the string instead.Then instead of using REGEXP_SUBSTR()
, use REGEXP_REPLACE()
like this, replacing the entire string with what was stored in backreference position 1 (your number):
REGEXP_REPLACE(COMMENTS, '^.*?(\d+)(( ?min.*$)|$)', '\1') A
Upvotes: 3
Reputation: 26262
Alternate, working solution that employs multiple expressions:
TO_NUMBER(
CASE
-- number only
WHEN LENGTH( REGEXP_SUBSTR(MEAS_COMMENT, '^\d+$', 1, 1) ) >0 THEN REGEXP_SUBSTR(MEAS_COMMENT, '^\d+$', 1, 1)
-- XX min
WHEN LENGTH( REGEXP_SUBSTR(MEAS_COMMENT, '\d+( ?min)', 1, 1) ) >0 THEN REGEXP_SUBSTR(REGEXP_SUBSTR(MEAS_COMMENT, '\d+( ?min)', 1, 1), '\d+', 1, 1)
END
) MINUTES
** edit **
-- same results without the `CASE` statement:
TO_NUMBER( REGEXP_SUBSTR(REGEXP_SUBSTR(COMMENTS, '(\d+)( ?min|$)', 1, 1), '\d+', 1, 1) )
Upvotes: 0
Reputation: 9211
If you can guarantee that the timestamp will always appear as the last part of the string, this works:
(\d+)\s?(?:min.*)?$
I don't know if Oracle supports non-capturing groups, but that's easily worked around. My experience is that it's RegExp engine is limited to quite vanilla stuff.
Upvotes: 1