Angelina
Angelina

Reputation: 2265

What's the best way to check if characters exist in string in SQL?

I have strings like:

t.reported_name
-------------------------
    D3O using TM-0549 - Rev # 6
    D3O using TM-0549 - Rev # 6
    Water using TM-0415 - Rev #10
    Water using TM-0449 - Rev # 10
    Decanoic Acid using LL-1448 - Rev# 2
    DBE-821 using QCRM-0015 - Rev#1
    Water using TM-0441 Rev # 10
    FC Sessile Drop Contact Angle using MTM-017_REV_B - Rev # 1
    IPA using QCRM-0017
    Norbloc using TM-0501 - Rev # 5
    DK (non-edge corrected) using TM-0534 - Rev # 3
    Decanoic Acid_L3 using LL-1448_L3
    Decanoic Acid_L4 using LL-1448_L4
    MXP7-1911 using CRM-0239
    TMPTMA using TM-0515 - Rev# 8
    DK (edge corrected) using MTM-09 - Rev# 0

I need to extract test method (anything after 'using') with number, i.e: TM-0549 and Revision number, i.e: Rev # 6

select  distinct 
     case when REGEXP_LIKE(t.reported_name,'TM-', 'c') THEN SUBSTR(t.reported_name, INSTR(t.reported_name, 'TM'), 7) END test_method,
     case when INSTR(t.reported_name,'Rev #') = 1 THEN SUBSTR(t.reported_name, INSTR(t.reported_name, 'Rev'), 7) END revision_number
from test s

from above data I want:

test_method  revision_number

TM-0549     Rev # 6
TM-0549     Rev # 6
TM-0415     Rev #10
TM-0449     Rev # 10
LL-1448     Rev# 2
QCRM-0015   Rev#1
TM-0441     Rev # 10
MTM-017_REV_B  Rev # 1
QCRM-0017   null
TM-0501     Rev # 5
TM-0534     Rev # 3
LL-1448_L3  null
LL-1448_L4  null
CRM-0239    null
TM-0515     Rev# 8
MTM-09      Rev# 0

Upvotes: 1

Views: 1409

Answers (3)

Rost
Rost

Reputation: 31

As for me for particular situation query can look like:

WITH j
     AS (SELECT 'D3O using TM-0549 - Rev # 6' str FROM DUAL
         UNION ALL
         SELECT 'D3O using TM-0549 - Rev # 6' FROM DUAL
         UNION ALL
         SELECT 'Water using TM-0415 - Rev #10' FROM DUAL
         UNION ALL
         SELECT 'Water using TM-0449 - Rev # 10' FROM DUAL
         UNION ALL
         SELECT 'Decanoic Acid using LL-1448 - Rev# 2' FROM DUAL
         UNION ALL
         SELECT 'DBE-821 using QCRM-0015 - Rev#1' FROM DUAL
         UNION ALL
         SELECT 'Water using TM-0441 Rev # 10' FROM DUAL
         UNION ALL
         SELECT 'FC Sessile Drop Contact Angle using MTM-017_REV_B - Rev # 1' FROM DUAL
         UNION ALL
         SELECT 'IPA using QCRM-0017' FROM DUAL
         UNION ALL
         SELECT 'Norbloc using TM-0501 - Rev # 5' FROM DUAL
         UNION ALL
         SELECT 'DK (non-edge corrected) using TM-0534 - Rev # 3' FROM DUAL
         UNION ALL
         SELECT 'Decanoic Acid_L3 using LL-1448_L3' FROM DUAL
         UNION ALL
         SELECT 'Decanoic Acid_L4 using LL-1448_L4' FROM DUAL
         UNION ALL
         SELECT 'MXP7-1911 using CRM-0239' FROM DUAL
         UNION ALL
         SELECT 'TMPTMA using TM-0515 - Rev# 8' FROM DUAL
         UNION ALL
         SELECT 'DK (edge corrected) using MTM-09 - Rev# 0' FROM DUAL)

SELECT TRIM(RTRIM(TRIM (SUBSTR (clear_str, 0, INSTR (clear_str, ' ') + LENGTH (' '))),'-')) AS left_str,
       TRIM(LTRIM(TRIM (SUBSTR (clear_str, INSTR (clear_str, ' ') + LENGTH (' '))),'-')) AS right_str
  FROM (SELECT TRIM (SUBSTR (str, INSTR (str, 'using') + LENGTH ('using'))) || ' ' clear_str FROM j)

UPD. Also, this solutions not depends on business data like 'Rev' or something else. But it is sensitive to spaces in our test_method value

Upvotes: 1

Frank Schmitt
Frank Schmitt

Reputation: 30775

Assuming

  • for test_method, we want to match TM-<number>
  • for revision, we want to match Rev # <number> (note the spaces around #)

Then here's a solution with REGEXP_SUBSTR:

select 
  regexp_substr(reported_name, 'TM\-[0-9]+') as test_method_regexsub,
  regexp_substr(reported_name, 'Rev # [0-9]+') as revision_regexsub            
from test t

And here's another one with REGEXP_REPLACE; we have to use the CASE/REGEXP_LIKE workaround to return an empty string if the regex doesn't match, because REGEXP_REPLACE returns the whole string unchanged if no match is found:

select 
  (case 
     when regexp_like(reported_name, '.*(TM\-[0-9]+).*') 
     then regexp_replace(reported_name, '.*(TM\-[0-9]+).*', '\1') 
     else '' 
   end)  as test_method_regexrepl,
  (case 
     when regexp_like(reported_name, '.*(Rev # [0-9]+).*') 
     then regexp_replace(reported_name, '.*(Rev # [0-9]+).*', '\1') 
     else '' 
   end)  as revision_regexrepl
from test t

The second approach uses a capturing group (Rev # [0-9]+) and replaces the whole string with its contents \1.

2nd UPDATE

Assuming

  • everything in front of using should be ignored
  • everything up to an optional Rev is the test method name
  • a revision consists of Rev # <number>, where the first space is optional

this should work:

select reported_name,
       (case 
          when regexp_like(reported_name, '.* using (.*)( - Rev.*)') 
            then regexp_replace(reported_name, '.* using (.*)( - Rev.*)', '\1') 
          when regexp_like(reported_name, '.* using (.*)') 
            then regexp_replace(reported_name, '.* using (.*)', '\1') 
          else '' end)  as test_method_regexrepl,
       (case when regexp_like(reported_name, '.* - (Rev[ ]?# [0-9]+)') 
          then regexp_replace(reported_name, '.*(Rev[ ]?# [0-9]+)', '\1') 
          else '' end)  as revision_regexrepl
from test t

Explanation:

  • .* using (.*)( - Rev.*) is our regex for a test method that has a revision. It matches

    • an arbitrary string .*
    • the string using (note the two spaces)
    • an arbitrary string (.*) - we use the parentheses () to capture this part of the match in a group
    • the string - Rev, followed by an arbitrary string; again, we use parentheses to capture the string in a group (although we don't really need that)

    If we have a match, we replace the whole string with the first capturing group \1 (this contains the part between using and Rev

  • .* using (.*) is our fallback for a test method without the revision; it matches

    • an arbitrary string .*
    • the string using (note the two spaces)
    • an arbitrary string (.*) - we use the parentheses () to capture this part of the match in a group

    If we have a match, we replace the whole string with the first capturing group \1 (this contains the part between using and Rev

  • .* - (Rev[ ]?# [0-9]+) is our regex for the revision part. It matches

    • an arbitrary string followed by a hyphen surrounded by spaces .* -
    • the word Rev
    • an optional space [ ]?
    • a lattice followed by a space #
    • one or more digits [0-9]+ and again uses a capturing group (Rev...) for the "interesting" part

    If we have a match, we replace the whole string with the first capturing group \1 (this contains the part between Rev and the last digit)

SQL Fiddle

Upvotes: 1

vav
vav

Reputation: 4694

TMPTMA is not matched by your regexp, instr is working:

select 'regexp match' descr, count(*) 
from dual where regexp_like('TM-PTMA', 'TM-') 
union all
select 'regexp no match', count(*) 
from dual where regexp_like('TMPTMA', 'TM-') 
union all
select 'instr no match', count(*) 
from dual where instr('TMPTMA', 'TM-') > 0
union all
select 'instr match', count(*) 
from dual where instr('TM-PTMA', 'TM-') > 0

Output:

DESCR             COUNT(*)
--------------- ----------
regexp match             1 
regexp no match          0 
instr no match           0 
instr match              1 

Upvotes: 1

Related Questions