Reputation: 471
This is a second part to a previous question I'm asking. I have a table of license numbers and am trying to determine if they are legitimate. One of the tests that I'm trying to run is if the numbers are sequential/counting pattern. What would be really helpful is if I can get a max count of sequential characters, but I feel that is more difficult. For example:
LICENSE_NUMBER IS_COUNTING NUM_SEQ_CHARS
123456789 TRUE 9
123455678 FALSE 5
456789012 TRUE 9
12345 TRUE 5
123451234 FALSE 5
Thanks for your help!
Upvotes: 0
Views: 1145
Reputation:
Here is an efficient way to determine the maximum count of consecutive increasing digits in a given license number. I didn't bother with the YES/NO column - that can be obtained trivially, for example by comparing the MAX_COUNT
to length(license_number)
. Leaving that as an exercise.
We need to break up the license number into its individual characters, keeping track of position. Then we can subtract the "position" and take the residue modulo 10. Since Oracle believes that mod(-1, 10)
is -1 and not 9 (they really need to go back to elementary school to re-learn the MODULO
function), I need to add 10 so I get the correct result in all cases (explaining an "oddity" in the solution below). Now the problem becomes: Find the longest sequence of "equal" residues mod 10 for each license number.
Finding consecutive "something" in a sequence is done most efficiently with the so-called "Tabibitosan method" (the difference of two row_number()
s in the CTE I called prep
in my solution). Then it all becomes a matter of grouping, counting and taking a max()
.
with
-- begin of test data; not part of the solution
test_data ( license_number ) as (
select '123456789' from dual union all
select '123455678' from dual union all
select '456789012' from dual union all
select '12345' from dual union all
select '123451234' from dual union all
select '402023488' from dual union all
select '4189012' from dual
),
-- end of test data; solution (SQL query) continues below this line
tokenized ( license_number, idx, res ) as (
select license_number, level,
mod(10 + to_number(substr(license_number, level, 1)) - level, 10)
from test_data
connect by level <= length(license_number)
and prior license_number = license_number
and prior sys_guid() is not null
),
prep ( license_number, res, grp ) as (
select license_number, res,
row_number() over (partition by license_number order by idx) -
row_number() over (partition by license_number, res order by idx)
from tokenized
),
grouped ( license_number, res, grp, ct ) as (
select license_number, res, grp, count(*)
from prep
group by license_number, res, grp
)
select license_number, max(ct) as max_count
from grouped
group by license_number
;
Output:
LICENSE_NUMBER MAX_COUNT
-------------- ---------
123455678 5
123456789 9
456789012 9
123451234 5
4189012 5
12345 5
402023488 3
Upvotes: 1
Reputation: 36127
This query also calculates a count of sequential characters:
SELECT LICENSE_NUMBER,
CASE length( LICENSE_NUMBER ) WHEN max( cnt) THEN 'TRUE' ELSE 'FALSE' END As IS_COUNTING ,
max( cnt) As NUM_SEQ_CHARS
FROM (
SELECT LICENSE_NUMBER, p, count(*) As cnt
FROM (
SELECT LICENSE_NUMBER,
SUM( xx ) OVER ( Partition By LICENSE_NUMBER ORDER BY x ) As p
FROM (
SELECT LICENSE_NUMBER,
x, qq,
CASE WHEN qq - 1 = LAG( qq ) OVER ( Partition By LICENSE_NUMBER ORDER BY x )
THEN 0
WHEN qq = 0 AND 9 = LAG( qq ) OVER ( Partition By LICENSE_NUMBER ORDER BY x )
THEN 0
ELSE 1
END As xx
FROM (
SELECT LICENSE_NUMBER, x, substr( LICENSE_NUMBER, x, 1 ) qq
FROM (
SELECT *
FROM table
CROSS JOIN (
SELECT level x FROM dual
CONNECT BY LEVEL <= ( SELECT MAX( length( LICENSE_NUMBER )) FROM table )
)
)
)
)
)
GROUP BY LICENSE_NUMBER, p
)
GROUP BY LICENSE_NUMBER
Upvotes: 0
Reputation: 89231
^(01(2345678901)*(2(3(4(5(6(7(8(90?)?)?)?)?)?)?)?)?
|12(3456789012)*(3(4(5(6(7(8(9(01?)?)?)?)?)?)?)?)?
|23(4567890123)*(4(5(6(7(8(9(0(12?)?)?)?)?)?)?)?)?
|34(5678901234)*(5(6(7(8(9(0(1(23?)?)?)?)?)?)?)?)?
|45(6789012345)*(6(7(8(9(0(1(2(34?)?)?)?)?)?)?)?)?
|56(7890123456)*(7(8(9(0(1(2(3(45?)?)?)?)?)?)?)?)?
|67(8901234567)*(8(9(0(1(2(3(4(56?)?)?)?)?)?)?)?)?
|78(9012345678)*(9(0(1(2(3(4(5(67?)?)?)?)?)?)?)?)?
|89(0123456789)*(0(1(2(3(4(5(6(78?)?)?)?)?)?)?)?)?
|90(1234567890)*(1(2(3(4(5(6(7(89?)?)?)?)?)?)?)?)?)$
It will match any string of two or more digits, that consists entirely of sequential digits.
123456789 MATCH
123455678 NOT MATCH
456789012 MATCH
12345 MATCH
Demo: http://regexr.com/3enkb
Upvotes: 1