Reputation: 104
I am working on regex_substr
to get the following result in oracle
Test 1.Input: JOHN 10BC STUDENT Desired Output: JOHN
Test 2.Input: JOHN STUDENT Desired Output: JOHN
Test 3.Input: JOHN 10BC STU Desired Output: JOHN
Test 4.Input: JOHN 10BC TEACHER Desired Output:NULL
Test 5.Input: JOHN TEACHER Desired Output:NULL
Test 6. Input: MR JOHN 08BC STU Desired Output: MR JOHN
Test 7. Input: MR JOHN STUDENT Desired Output: MR JOHN
Test 8. Input: MR JOHN 07BC TEACHER Desired Output: Null
Test 9. Input: MR STUART 06BC STDUENT Desired Output: MR STUART
Test 10. Input: MR STUART LEE 05BC STDUENT Desired Output: MR STUART LEE
Query:
Test 1:
select REGEXP_SUBSTR('JOHN 10BC STUDENT','(.*)(\s+.*BC)?\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Failed . Returned JOHN 10BC instead of JOHN
Test 2:
select REGEXP_SUBSTR('JOHN STUDENT','(.*)(\s+.*BC)?\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Passed. Returned JOHN
I modified the query as below by removing the ? in the second block
Test 1:select REGEXP_SUBSTR('JOHN 10BC STUDENT','(.*)(\s+.*BC)\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Passed. Returned JOHN
Test 2:select REGEXP_SUBSTR('JOHN STUDENT','(.*)(\s+.*BC)\sSTU(DENT)?',1,1,'i',1) from dual;
Output: Failed. Returned Null instead of JOHN.
How to ignore the middle optional word BC and always return the words till BC as a substring for STUDENT using REGEXP_SUBSTR
. Thanks in Advance!!!
Upvotes: 0
Views: 358
Reputation: 6751
Use non-greedy pattern for any-character match (.*
) by adding a question mark after it (.*?
) to prevent it from grabbing an optional part ((\s+.*BC)?
):
with test (id, col) as ( select 1, 'JOHN 10BC STUDENT' from dual union all select 2, 'JOHN STUDENT' from dual union all select 3, 'JOHN 10BC STU' from dual union all select 4, 'JOHN 10BC TEACHER' from dual union all select 4, 'JSTUOHN 10BC TEACHER' from dual union all select 5, 'JOHN TEACHER' from dual ) select id, col, regexp_substr(col, '(.*?)(\s+.*BC)?\sSTU(DENT)?',1,1,'i',1) as qwe from test
ID | COL | QWE -: | :------------------- | :---- 1 | JOHN 10BC STUDENT | JOHN 2 | JOHN STUDENT | JOHN 3 | JOHN 10BC STU | JOHN 4 | JOHN 10BC TEACHER | null 4 | JSTUOHN 10BC TEACHER | null 5 | JOHN TEACHER | null
db<>fiddle here
UPD. To allow multiple words as the first part, you may use the following regex:
(.+?)(\s+\S*BC)?\s+STU(DENT)?$
Explanation:
(.+?)
- match any character one or more times with the shortest match (allowing following optional groups to be matched).(\s+\S*BC)?
- optional match for the group of zero or more non-whitespace (\S
) characters followed by BC
. The group should be separated from the previous non-whitespace character by one or more whitespace characters. \S*
instead of .*
will prevent from inclusion of multiple words into this group.\s+STU(DENT)?$
- the last part should be a word STUDENT
or it's abbreviation STU
. It should be the last word in the line ($
right after it).And the result of extended test case is:
with test (id, col) as ( select 1, 'JOHN 10BC STUDENT' from dual union all select 2, 'JOHN STUDENT' from dual union all select 3, 'JOHN 10BC STU' from dual union all select 4, 'JOHN 10BC TEACHER' from dual union all select 4, 'JSTUOHN 10BC TEACHER' from dual union all select 5, 'JOHN TEACHER' from dual union all select 6, 'MR JOHN 10BC STU' from dual union all select 7, 'MR STUART ABC 10BC STUDENT' from dual union all select 8, 'MR STUART ABC 10BC STUDENTS' from dual union all select 9, 'MR STUART STU 10BC TEACHER' from dual ) select id, col, regexp_substr(col, '(.*?)(\s+\S*BC)?\s+STU(DENT)?$',1,1,'i',1) as qwe from test
ID | COL | QWE -: | :-------------------------- | :------------ 1 | JOHN 10BC STUDENT | JOHN 2 | JOHN STUDENT | JOHN 3 | JOHN 10BC STU | JOHN 4 | JOHN 10BC TEACHER | null 4 | JSTUOHN 10BC TEACHER | null 5 | JOHN TEACHER | null 6 | MR JOHN 10BC STU | MR JOHN 7 | MR STUART ABC 10BC STUDENT | MR STUART ABC 8 | MR STUART ABC 10BC STUDENTS | null 9 | MR STUART STU 10BC TEACHER | null
db<>fiddle here
Upvotes: 1
Reputation: 168106
How to ignore the middle optional word
10BC
and always return the first word as a substring forSTUDENT
?
You do not need to use (slow) regular expressions and can use (much faster) simple string functions:
WITH test (id, col) as (
select 1, 'JOHN 10BC STUDENT' from dual union all
select 2, 'JOHN STUDENT' from dual union all
select 3, 'JOHN 10BC STU' from dual union all
select 4, 'JOHN 10BC TEACHER' from dual union all
select 5, 'JSTUOHN 10BC TEACHER' from dual union all
select 6, 'JOHN TEACHER' from dual
)
SELECT id,
col,
CASE
WHEN col LIKE '% STUDENT'
OR col LIKE '% STU'
THEN SUBSTR(col, 1, INSTR(col, ' ') - 1)
END AS first_word
FROM test
(using astentx's data)
The output is:
ID COL FIRST_WORD 1 JOHN 10BC STUDENT JOHN 2 JOHN STUDENT JOHN 3 JOHN 10BC STU JOHN 4 JOHN 10BC TEACHER null 5 JSTUOHN 10BC TEACHER null 6 JOHN TEACHER null
db<>fiddle here
Upvotes: 0
Reputation: 142798
Do you really need to check what's in the middle? Perhaps you could ignore that and work with the rest of the requirement (check for STU or STUDENT (as your sample data suggest) and fetch data according to that)?
Here are two options (result_1
and result_2
), see if any of these help.
Sample data:
SQL> with test (id, col) as
2 (select 1, 'JOHN 10BC STUDENT' from dual union all
3 select 2, 'JOHN STUDENT' from dual union all
4 select 3, 'JOHN 10BC STU' from dual union all
5 select 4, 'JOHN 10BC TEACHER' from dual union all
6 select 5, 'JOHN TEACHER' from dual
7 )
Query begins here:
8 select id,
9 col,
10 case when instr(col, 'STU') > 0 then substr(col, 1, instr(col, ' ') - 1)
11 else null
12 end result_1,
13 --
14 case when regexp_substr(col, '\w+$') in ('STU', 'STUDENT') then
15 regexp_substr(col, '^\w+')
16 else null
17 end result_2
18 from test;
ID COL RESULT_1 RESULT_2
---------- ----------------- -------------------- --------------------
1 JOHN 10BC STUDENT JOHN JOHN
2 JOHN STUDENT JOHN JOHN
3 JOHN 10BC STU JOHN JOHN
4 JOHN 10BC TEACHER
5 JOHN TEACHER
SQL>
result_1
searches col
for the STU
string; if it is found, it returns the first word (from beginning of col
up to the first space character)result_2
checks whether the last word (but this time using regexp_substr, anchoring to the end $
of col
) is "STU" or "STUDENT" and returns the first word (anchored to the beginning ^
of col
Upvotes: 0