B-Ray
B-Ray

Reputation: 471

Maximum Count Repeating Characters in Oracle SQL Column

I have a column in a database of license numbers. I am trying to determine if the numbers are legitimate. One of the tests is to determine if there are multiple occurrences of the same character. I'm trying to determine what the count of the most repeated character is. The second test is to determine if all the numbers are sequential, but I'm posting that as a different question.

An example of what I'm looking for:

LICENSE_NUMBER  MAX_COUNT
111246544       3
999999999       9
123456789       0
AAAAAAAAA       9
A12345667       2

Thanks for your help!

Upvotes: 0

Views: 2199

Answers (4)

user5683823
user5683823

Reputation:

Here is a solution to a different formulation of the same problem. Suppose we want to find the maximum number of CONSECUTIVE occurrences of the same character in the license number. For example, if the license number is 112211220, then both 1 and 2 appear 4 times. However, the max number of CONSECUTIVE occurrences of the same character is 2.

This question requires breaking up each license number into its individual characters, keeping track of their position in the string. I do that in the first CTE ("first" not counting the test data, that is). Then identifying consecutive "something" is often done most efficiently using the so-called Tabibitosan method (the CTE using two different calls to row_number()). After that, it's a simple matter of grouping, counting and taking the max.

with
-- begin of test data, not part of the solution
     test_data ( license_number ) as (
       select '111246544' from dual union all
       select '999999999' from dual union all
       select '123456789' from dual union all
       select 'AAAAAAAAA' from dual union all
       select 'A12345667' from dual union all
       select '112211220' from dual
     ),
-- end of test data; solution (SQL query) continues below this line
     tokenized ( license_number, idx, ch ) as (
       select license_number, level, substr(license_number, level, 1)
       from   test_data
       connect by level <= length(license_number)
           and prior license_number = license_number
           and prior sys_guid() is not null
     ),
     prep ( license_number, idx, ch, grp ) as (
       select license_number, idx, ch,
              row_number() over (partition by license_number order by idx) -
                row_number() over (partition by license_number, ch order by idx)
       from   tokenized
     ),
     grouped ( license_number, ch, grp, ct ) as (
       select   license_number, ch, grp, count(*)
       from     prep
       group by license_number, ch, grp
     )
select   license_number, max(ct) as max_count
from     grouped
group by license_number
;

Output:

LICENSE_NUMBER  MAX_COUNT
--------------  ---------
AAAAAAAAA               9
123456789               1
A12345667               2
999999999               9
111246544               3
112211220               2

Upvotes: 1

user5683823
user5683823

Reputation:

Here is one way. To count how many occurrences of a character, say 'x', exist in a string like 'zxxydds', the fastest way is to use the string function REPLACE() to remove all the occurrences of 'x' from the string (by replacing 'x' with ''), and then to subtract the length of the resulting string from the length of the original string. A little care must be taken because if the string was all 'x' (like 'xxxxx') then the resulting string is empty and the length of the empty string in Oracle is NULL instead of zero. So we need to use a call to coalesce() to make such a length to be zero instead of NULL.

The rest is easy: Create a "fake table" of all the possible characters in a license number (for example, 0-9 and A-F, but you can generalize to any list of characters) - I do that with a simple table() function - then do a cross join, compute the number of occurrences of each character, and take the max() grouping by license_number. One final simplification I made: if C is a constant (a fixed number) and x ranges over a set, then max(C-x) = C - min(x).

with
     test_data ( license_number ) as (
       select '111246544' from dual union all
       select '999999999' from dual union all
       select '123456789' from dual union all
       select 'AAAAAAAAA' from dual union all
       select 'A12345667' from dual
     )
-- end of test data; solution (SQL query) begins below this line
select t.license_number,
       length(t.license_number) - 
         min(coalesce(length(replace(t.license_number, c.column_value, '')), 0)) 
                                                        as max_count
from   test_data t cross join
       table(sys.odcivarchar2list(
              '0','1','2','3','4','5','6','7','8','9','0','A','B','C','D','E','F')) c
group by t.license_number
;

LICENSE_NUMBER  MAX_COUNT
--------------  ---------
AAAAAAAAA               9
123456789               1
A12345667               2
999999999               9
111246544               3

Upvotes: 1

krokodilko
krokodilko

Reputation: 36127

Assuming that in this case: 123456789 the count of the most repeated character is 1 - not 0, then a query with few of subqueries does the work:

SELECT LICENSE_NUMBER, max( cnt )
FROM (
    SELECT LICENSE_NUMBER, substr( LICENSE_NUMBER, x, 1 ) qq, count(*) As cnt
    FROM (
        SELECT *
        FROM table
        CROSS JOIN (
           SELECT level x FROM dual
           -- get a max length of lincence with the belo subquery
           CONNECT BY LEVEL <= ( SELECT MAX( length( LICENSE_NUMBER )) FROM table )
        )
    )
    GROUP BY LICENSE_NUMBER, substr( LICENSE_NUMBER, x, 1 )
)
WHERE qq IS NOT NULL
GROUP BY LICENSE_NUMBER
ORDER BY 1;

If you prefer to print 0 instead of 1, just change the first line:

SELECT LICENSE_NUMBER, CASE max( cnt ) WHEN 1 THEN 0 ELSE max( cnt ) END 
FROM ....

Upvotes: 0

Kacper
Kacper

Reputation: 4818

That's not the prettiest code and if licence can contain more characters (I gues it is hexadecimal) it is going to be long query but you can try:

select licence_number, greatest(REGEXP_COUNT(licence_number, '1'), 
  REGEXP_COUNT(licence_number, '2'), REGEXP_COUNT(licence_number, '3'), 
  REGEXP_COUNT(licence_number, '4'), REGEXP_COUNT(licence_number, '5'), 
  REGEXP_COUNT(licence_number, '6'), REGEXP_COUNT(licence_number, '7'), 
  REGEXP_COUNT(licence_number, '8'), REGEXP_COUNT(licence_number, '9'), 
  REGEXP_COUNT(licence_number, '0'), REGEXP_COUNT(licence_number, 'A'),
  REGEXP_COUNT(licence_number, 'B'), REGEXP_COUNT(licence_number, 'C'),
  REGEXP_COUNT(licence_number, 'D'), REGEXP_COUNT(licence_number, 'E'),
  REGEXP_COUNT(licence_number, 'F') ) as max_count
from table;

Upvotes: 1

Related Questions