Varun Chadha
Varun Chadha

Reputation: 1

Oracle SQL Query, function to split, process and count value in column?

I have a scenario where I would like to count how many splits are present column but the problem now is that the split itself has another count.

This is how data looks for this table:

ID  task_roles
1   author 2x - writer 1x
2   author 1x - writer 1x
3   author

I have created a simple query as below:

SELECT id, task_roles, (LENGTH(task_roles) - LENGTH(REPLACE(task_roles,'-','')) + 1) AS no_of_task_roles 

FROM oracle_table 

I got below output after running this query:

ID  task_roles                           no_of_task_roles 
1   author 2x - writer 1x                2
2   author 1x - writer 1x - tester 3x    3
3   author                               1

However, I need to count mentioned within the split itself as well. e.g. if split is author 2x, then I want to count this as 2 splits instead of 1 and if split is tester 3x, I want to count it as 3 splits instead of 1. So, my expected output should be as below:

ID  task_roles                           no_of_task_roles 
1   author 2x - writer 1x                3
2   author 1x - writer 1x - tester 3x    5
3   author                               1

Can someone please help what should be the updated query for such scenarios?

Upvotes: 0

Views: 233

Answers (3)

user5683823
user5683823

Reputation:

Here is a truly horrible hack for this problem. (I am using "hack" in the good sense: a solution that should work, at least in most if not in all cases, and that is 100% correct, but it uses various tools in ways they weren't meant to be used.)

Two ideas: first, convert the string into an arithmetic expression (still a string, but representing an arithmetic expression, like '1 + 3 + 1'). Then, use xmlquery to evaluate the string as an arithmetic expression. The second step is easy; the first, not as much.

The difficulty in the first step is to handle correctly both "tokens" that have something like a '2x' at the end and those that don't. To handle the "tokens" that don't have a '<num>x' at the end, I add parentheses in the regexp_replace function; then I handle parentheses that have nothing within them, replacing them with 1.

Assumption: All inputs are made up of "tokens" separated by "separators". A token must contain at least one character, and end optionally in one or more digits followed by lower-case 'x'. "Separators" are dashes, preceded and followed by one or more spaces. (This way, hyphenated words within a "token" are still allowed, such as author-composer.)

The sample data is included at the top in the WITH clause (not part of the solution).

with
  sample_data (id, task_roles) as (
    select 1, 'author 2x - writer 1x'                   from dual union all
    select 2, 'author 1x - writer 12x - tester 3x'      from dual union all
    select 3, 'author'                                  from dual union all
    select 4, 'pipe-fitter - mechanic 2x - electrician' from dual
  )
-- end of sample data; solution begins below this line
select id, task_roles,
       xmlcast(xmlquery(replace(
                 regexp_replace(task_roles, '.+?((\d+)x)?( +- +|$)', '+(\2)')
                 , '()', '1') returning content) as number) as role_count
from   sample_data;

ID TASK_ROLES                              ROLE_COUNT
-- --------------------------------------- ----------
 1 author 2x - writer 1x                            3
 2 author 1x - writer 12x - tester 3x              16
 3 author                                           1
 4 pipe-fitter - mechanic 2x - electrician          4

Upvotes: 0

Littlefoot
Littlefoot

Reputation: 142958

Alternatively (read comments within code):

Sample data:

SQL> select * from table_name order by id;

        ID TASK_ROLES
---------- ----------------------------------
         1 author 2x - writer 1x
         2 author 1x - writer 12x - tester 3x
         3 author

Query:

SQL> with temp as
  2    -- Replace all non-digits with spaces and the remove superfluous spaces
  3    -- For example, for ID = 1, you'll get a string ' 2 1'
  4    (select id,
  5      task_roles,
  6      regexp_replace(regexp_replace(task_roles, '[^[:digit:]]', ' '), '\s+', ' ') val
  7     from table_name
  8    )
  9  -- split VAL into rows so that you could SUM those numbers
 10  select id,
 11    task_roles,
 12    sum(to_number(regexp_substr(case when val = ' ' then '1 ' else val end,
 13                  '[^ ]+', 1, column_value))) result
 14  from temp cross join
 15  table(cast(multiset(select level from dual
 16                      connect by level <= regexp_count(val, ' ')
 17                     ) as sys.odcinumberlist))
 18  group by id, task_roles
 19  order by id;

        ID TASK_ROLES                             RESULT
---------- ---------------------------------- ----------
         1 author 2x - writer 1x                       3
         2 author 1x - writer 12x - tester 3x         16
         3 author                                      1

SQL>

Upvotes: 0

MT0
MT0

Reputation: 168232

You can use a recursive sub-query to split the string and find the numbers of roles:

WITH bounds ( ID, task_roles, idx, num_roles, total_roles ) AS (
  SELECT id,
         task_roles,
         1,
         COALESCE(
           TO_NUMBER(
             REGEXP_SUBSTR( task_roles, '(\D+)( (\d+)x)?( - |$)', 1, 1, NULL, 3  )
           ),
           1
         ),
         REGEXP_COUNT( task_roles, '(\D+)( (\d+)x)?( - |$)' )
  FROM   table_name
UNION ALL
  SELECT id,
         task_roles,
         idx + 1,
         COALESCE(
           TO_NUMBER(
             REGEXP_SUBSTR( task_roles, '(\D+)( (\d+)x)?( - |$)', 1, idx + 1, NULL, 3 )
           ),
           1
         ),
         total_roles
  FROM   bounds
  WHERE  idx < total_roles
)
SELECT ID,
       task_roles,
       SUM( num_roles ) AS total_roles
FROM   bounds
GROUP BY ID, task_roles

Which, for the sample data:

CREATE TABLE table_name ( ID, task_roles ) AS
SELECT 1, 'author 2x - writer 1x' FROM DUAL UNION ALL
SELECT 2, 'author 1x - writer 1x - tester 3x' FROM DUAL UNION ALL
SELECT 3, 'author' FROM DUAL;

Outputs:

ID | TASK_ROLES                        | TOTAL_ROLES
-: | :-------------------------------- | ----------:
 2 | author 1x - writer 1x - tester 3x |           5
 3 | author                            |           1
 1 | author 2x - writer 1x             |           3

db<>fiddle here

Upvotes: 1

Related Questions