Reputation: 641
I have the below one column "table" (apologies for the data model, not my fault :():
COL_IN
------
2K, E
E, 2K
O
I would like to obtain the below output, ordered by count descending:
COL_OUT COUNT
----------
K 4
E 2
O 1
Upvotes: 0
Views: 91
Reputation:
COUNT
is a reserved keyword, so it's not a good column name - even in the final output. I use COUNT_
instead (with an underscore).
Other than that, you can modify the input strings so they become valid JSON arrays, so that you can then use JSON functions to split them. After you split the strings into tokens, it's a simple matter to separate the leading number (if present) from the rest of the string, and to aggregate. NVL
in the sum adds 1 for each token without a leading integer.
Including the sample data for testing only (if you have an actual table, remove the WITH
clause at the top):
with
tbl (col_in) as (
select '2K, E' from dual union all
select 'E, 2K' from dual union all
select 'O' from dual
)
select ltrim(col, '0123456789') as col_out
, sum(nvl(to_number(regexp_substr(col, '^\d*')), 1)) as count_
from tbl,
json_table('["' || regexp_replace(col_in, ', *', '","') || '"]', '$[*]'
columns col path '$')
group by ltrim(col, '0123456789')
order by count_ desc, col_out
;
COL_OUT COUNT_
------- ------
K 4
E 2
O 1
Upvotes: 2
Reputation: 65393
You can use hierarchical query in such a way that
WITH t2 AS
(
SELECT TRIM(REGEXP_SUBSTR(col_in,'[^,]+',1,level)) AS s
FROM t
CONNECT BY level <= REGEXP_COUNT(col_in,',')+1
AND PRIOR SYS_GUID() IS NOT NULL
AND PRIOR col_in = col_in
)
SELECT REGEXP_SUBSTR(s,'[^0-9]') AS col_out,
SUM(NVL(REGEXP_SUBSTR(s,'[^[:alpha:]]'),1)) AS count
FROM t2
GROUP BY REGEXP_SUBSTR(s,'[^0-9]'),REGEXP_SUBSTR(s,'[^[:alpha:]]')
ORDER BY count DESC
presuming all of the data are alphanumeric only(eg.not containing special charaters such as $,#,! ..etc.)
Upvotes: 2