banana_99
banana_99

Reputation: 641

Count and order comma separated values

I have the below one column "table" (apologies for the data model, not my fault :():

COL_IN
------
2K, E
E, 2K
O

I would like to obtain the below output, ordered by count descending:

COL_OUT COUNT
----------
K   4
E   2
O   1

Upvotes: 0

Views: 91

Answers (2)

user5683823
user5683823

Reputation:

COUNT is a reserved keyword, so it's not a good column name - even in the final output. I use COUNT_ instead (with an underscore).

Other than that, you can modify the input strings so they become valid JSON arrays, so that you can then use JSON functions to split them. After you split the strings into tokens, it's a simple matter to separate the leading number (if present) from the rest of the string, and to aggregate. NVL in the sum adds 1 for each token without a leading integer.

Including the sample data for testing only (if you have an actual table, remove the WITH clause at the top):

with
  tbl (col_in) as (
    select '2K, E' from dual union all
    select 'E, 2K' from dual union all
    select 'O'     from dual
  )
select ltrim(col, '0123456789') as col_out
     , sum(nvl(to_number(regexp_substr(col, '^\d*')), 1)) as count_
from   tbl,
       json_table('["' || regexp_replace(col_in, ', *', '","') || '"]', '$[*]'
            columns col path '$')
group  by ltrim(col, '0123456789')
order  by count_ desc, col_out
;

COL_OUT COUNT_
------- ------
K            4
E            2
O            1

Upvotes: 2

Barbaros Özhan
Barbaros Özhan

Reputation: 65393

You can use hierarchical query in such a way that

WITH t2 AS
(
   SELECT TRIM(REGEXP_SUBSTR(col_in,'[^,]+',1,level)) AS s
     FROM t
  CONNECT BY level <= REGEXP_COUNT(col_in,',')+1 
      AND PRIOR SYS_GUID() IS NOT NULL
      AND PRIOR col_in = col_in
)
SELECT REGEXP_SUBSTR(s,'[^0-9]') AS col_out,
       SUM(NVL(REGEXP_SUBSTR(s,'[^[:alpha:]]'),1)) AS count
  FROM t2
 GROUP BY REGEXP_SUBSTR(s,'[^0-9]'),REGEXP_SUBSTR(s,'[^[:alpha:]]')
 ORDER BY count DESC  
  

presuming all of the data are alphanumeric only(eg.not containing special charaters such as $,#,! ..etc.)

Upvotes: 2

Related Questions