Reputation: 1
I have a scenario where I would like to count how many splits are present column but the problem now is that the split itself has another count.
This is how data looks for this table:
ID task_roles
1 author 2x - writer 1x
2 author 1x - writer 1x
3 author
I have created a simple query as below:
SELECT id, task_roles, (LENGTH(task_roles) - LENGTH(REPLACE(task_roles,'-','')) + 1) AS no_of_task_roles
FROM oracle_table
I got below output after running this query:
ID task_roles no_of_task_roles
1 author 2x - writer 1x 2
2 author 1x - writer 1x - tester 3x 3
3 author 1
However, I need to count mentioned within the split itself as well. e.g. if split is author 2x, then I want to count this as 2 splits instead of 1 and if split is tester 3x, I want to count it as 3 splits instead of 1. So, my expected output should be as below:
ID task_roles no_of_task_roles
1 author 2x - writer 1x 3
2 author 1x - writer 1x - tester 3x 5
3 author 1
Can someone please help what should be the updated query for such scenarios?
Upvotes: 0
Views: 233
Reputation:
Here is a truly horrible hack for this problem. (I am using "hack" in the good sense: a solution that should work, at least in most if not in all cases, and that is 100% correct, but it uses various tools in ways they weren't meant to be used.)
Two ideas: first, convert the string into an arithmetic expression (still a string, but representing an arithmetic expression, like '1 + 3 + 1'). Then, use xmlquery
to evaluate the string as an arithmetic expression. The second step is easy; the first, not as much.
The difficulty in the first step is to handle correctly both "tokens" that have something like a '2x'
at the end and those that don't. To handle the "tokens" that don't have a '<num>x'
at the end, I add parentheses in the regexp_replace
function; then I handle parentheses that have nothing within them, replacing them with 1.
Assumption: All inputs are made up of "tokens" separated by "separators". A token must contain at least one character, and end optionally in one or more digits followed by lower-case 'x'. "Separators" are dashes, preceded and followed by one or more spaces. (This way, hyphenated words within a "token" are still allowed, such as author-composer.)
The sample data is included at the top in the WITH clause (not part of the solution).
with
sample_data (id, task_roles) as (
select 1, 'author 2x - writer 1x' from dual union all
select 2, 'author 1x - writer 12x - tester 3x' from dual union all
select 3, 'author' from dual union all
select 4, 'pipe-fitter - mechanic 2x - electrician' from dual
)
-- end of sample data; solution begins below this line
select id, task_roles,
xmlcast(xmlquery(replace(
regexp_replace(task_roles, '.+?((\d+)x)?( +- +|$)', '+(\2)')
, '()', '1') returning content) as number) as role_count
from sample_data;
ID TASK_ROLES ROLE_COUNT
-- --------------------------------------- ----------
1 author 2x - writer 1x 3
2 author 1x - writer 12x - tester 3x 16
3 author 1
4 pipe-fitter - mechanic 2x - electrician 4
Upvotes: 0
Reputation: 142958
Alternatively (read comments within code):
Sample data:
SQL> select * from table_name order by id;
ID TASK_ROLES
---------- ----------------------------------
1 author 2x - writer 1x
2 author 1x - writer 12x - tester 3x
3 author
Query:
SQL> with temp as
2 -- Replace all non-digits with spaces and the remove superfluous spaces
3 -- For example, for ID = 1, you'll get a string ' 2 1'
4 (select id,
5 task_roles,
6 regexp_replace(regexp_replace(task_roles, '[^[:digit:]]', ' '), '\s+', ' ') val
7 from table_name
8 )
9 -- split VAL into rows so that you could SUM those numbers
10 select id,
11 task_roles,
12 sum(to_number(regexp_substr(case when val = ' ' then '1 ' else val end,
13 '[^ ]+', 1, column_value))) result
14 from temp cross join
15 table(cast(multiset(select level from dual
16 connect by level <= regexp_count(val, ' ')
17 ) as sys.odcinumberlist))
18 group by id, task_roles
19 order by id;
ID TASK_ROLES RESULT
---------- ---------------------------------- ----------
1 author 2x - writer 1x 3
2 author 1x - writer 12x - tester 3x 16
3 author 1
SQL>
Upvotes: 0
Reputation: 168232
You can use a recursive sub-query to split the string and find the numbers of roles:
WITH bounds ( ID, task_roles, idx, num_roles, total_roles ) AS (
SELECT id,
task_roles,
1,
COALESCE(
TO_NUMBER(
REGEXP_SUBSTR( task_roles, '(\D+)( (\d+)x)?( - |$)', 1, 1, NULL, 3 )
),
1
),
REGEXP_COUNT( task_roles, '(\D+)( (\d+)x)?( - |$)' )
FROM table_name
UNION ALL
SELECT id,
task_roles,
idx + 1,
COALESCE(
TO_NUMBER(
REGEXP_SUBSTR( task_roles, '(\D+)( (\d+)x)?( - |$)', 1, idx + 1, NULL, 3 )
),
1
),
total_roles
FROM bounds
WHERE idx < total_roles
)
SELECT ID,
task_roles,
SUM( num_roles ) AS total_roles
FROM bounds
GROUP BY ID, task_roles
Which, for the sample data:
CREATE TABLE table_name ( ID, task_roles ) AS
SELECT 1, 'author 2x - writer 1x' FROM DUAL UNION ALL
SELECT 2, 'author 1x - writer 1x - tester 3x' FROM DUAL UNION ALL
SELECT 3, 'author' FROM DUAL;
Outputs:
ID | TASK_ROLES | TOTAL_ROLES -: | :-------------------------------- | ----------: 2 | author 1x - writer 1x - tester 3x | 5 3 | author | 1 1 | author 2x - writer 1x | 3
db<>fiddle here
Upvotes: 1