tow mort
tow mort

Reputation: 1

sql: How to find the max length of consecutive sequence of given char in a string using sql

let the char be s and string be mississsipisssss . The sql should return 5.

I was thinking to makeup something using

regexp_count(string,'s+') would return the number of substrings whith consecutive sequence of s .

REGEXP_INSTR(string,'s',1,m) would return the position of the m'th occurence of char s(starting position of mth sequence)

REGEXP_INSTR(string,'s[^s]',1,m) would return the position of the m'th occurence of char s (ending position of mth sequence)

I havent still come to any solution.But I dont think this is a correct way,as the strings could be of any length, and there could be any number of such consecutive sequences. Could someone give any idea for solution (I am beginner)

Upvotes: 0

Views: 869

Answers (3)

Matt
Matt

Reputation: 14341

This will work in sql-server 2008 +. The same concept will work in Oracle you just need to add a couple of syntax differences such as SUBSTR instead of SUBSTRING. Here is a quick attempt of converting to Oracle syntax:

CREATE GLOBAL TEMPORARY TABLE TempTable
    (String VARCHAR(100)) ON COMMIT PRESERVE ROWS;

INSERT INTO TempTable (String) VALUES ('mississsipisssss');
INSERT INTO #TempTable (String) VALUES ('ssasdfs');

WITH cteTokens (String, IndexNum, Token, CharCount) AS  (
    SELECT
       String
       ,1 as IndexNum
       ,SUBSTR(t.String,1,1) as Token
       ,CASE WHEN SUBSTR(t.String,1,1) = 's' THEN 1 ELSE 0 END As CharCount
    FROM
       #TempTable t

    UNION ALL

    SELECT
       t.String
       IndexNum + 1 as IndexNum
       ,SUBSTR(t.String,IndexNum + 1,1) as Token
       ,CASE WHEN SUBSTR(t.String,IndexNum + 1,1) = 's' THEN t.CharCount + 1 ELSE 0 END AS CharCount
    FROM
       #TempTable s
       INNER JOIN cteTokens t
       ON s.String = t.String
       AND LENGTH(s.String) >= t.IndexNum + 1
)

SELECT
    String
    ,MAX(CharCount)
FROM
    cteTokens
GROUP BY
    String
;

And Here is a SQL-Server Version

CREATE TABLE #TempTable (String VARCHAR(100))
INSERT INTO #TempTable (String) VALUES ('mississsipisssss')
INSERT INTO #TempTable (String) VALUES ('ssasdfs')

;WITH cteTokens (String, IndexNum, Token, CharCount) AS  (
    SELECT
       t.String
       ,1 as IndexNum
       ,SUBSTRING(t.String,1,1) as Token
       ,CASE WHEN SUBSTRING(t.String,1,1) = 's' THEN 1 ELSE 0 END As CharCount
    FROM
       #TempTable t

    UNION ALL

    SELECT
       t.String
       ,IndexNum + 1 as IndexNum
       ,SUBSTRING(t.String,IndexNum + 1,1) as Token
       ,CASE WHEN SUBSTRING(t.String,IndexNum + 1,1) = 's' THEN t.CharCount + 1 ELSE 0 END As CharCount
    FROM
       #TempTable s
       INNER JOIN cteTokens t
       ON s.String = t.String
       AND LEN(s.String) >= t.IndexNum + 1
)

SELECT
    String
    ,MAX(CharCount)
FROM
    cteTokens
GROUP BY
    String

It is a Recursive Common Table Expression [CTE] that splits the string into character tokens in order of Index Position and tests to see if they are the character you desire. If the token is the character then it builds on the count from the previous token if is character so all you have to do is take the MAX() of the result and you have your answer.

Upvotes: 1

user5683823
user5683823

Reputation:

Here is a standard solution using a hierarchical query. The CASE expression in the outer query is needed to give the answer "null" when the input string is "null" (otherwise the answer would be 0, and it shouldn't be). I didn't add an ORDER BY clause - you can do so if needed. Good luck!

with
     inputs ( str ) as (
       select 'mississsipisssss' from dual union all
       select null               from dual union all
       select 'stress'           from dual union all
       select 'alphabeta'        from dual
     ),
     prep ( str, lvl ) as (
     select str, level
     from   inputs
     connect by  prior str = str
             and prior sys_guid() is not null
             and regexp_instr(str, 's{' || to_char(level-1) || '}') != 0
     )
select   str, case when str is not null then max(lvl) - 1 end as max_len
from     prep
group by str
;


STR                 MAX_LEN
---------------- ----------
(null)               (null)
alphabeta                 0
stress                    2
mississsipisssss          5

4 rows selected.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269973

This is not an operation well supported by databases, although Postres definitely has functionality in this area.

If you know there is some limit on the number of substrings of "s"s, then you could do something like this:

select greatest(coalesce(length(regexp(substr(string, '[s]+', 1, 1)), 0),
                coalesce(length(regexp(substr(string, '[s]+', 1, 2)), 0),
                coalesce(length(regexp(substr(string, '[s]+', 1, 3)), 0),
                coalesce(length(regexp(substr(string, '[s]+', 1, 4)), 0),
                coalesce(length(regexp(substr(string, '[s]+', 1, 5)), 0),
                coalesce(length(regexp(substr(string, '[s]+', 1, 6)), 0)
               )

This would find the first 6 substrings and then calculate the maximum length.

The alternative is essentially to split the string yourself using connect by or a recursive CTE. However, the above might be sufficient.

Upvotes: 0

Related Questions