davy307
davy307

Reputation: 319

Regex to detect if character is repeated more than three times

I've tried to follow the solution described here: https://stackoverflow.com/a/17973873/2149915 to try and match a string with the following requirements: - More than 3 characters repeated sequentially in the string should be matched and returned.

Examples:

and so on and so forth, the idea is to detect text that is nonsensical.

So far my solution was to modify the regex in the link as such.

ORIGINAL: ^(?!.*([A-Za-z0-9])\1{2})(?=.*[a-z])(?=.*\d)[A-Za-z0-9]+$

ADAPTED: ^(?!.*([A-Za-z0-9\.\,\/\|\\])\1{3})$

Essentially i removed the requirement for capture groups of numbers and alphanumerics seen here: (?=.*[a-z])(?=.*\d)[A-Za-z0-9]+ and tried to add extra detection of characters such as ./,\ etc but it doesnt seem to match at all with any characters...

Any ideas on how i can achieve this?

thanks in advance :)

EDIT: i found this regex: ^.*(\S)(?: ?\1){9,}.*$ on this question https://stackoverflow.com/a/44659071/2149915 and have adapted it to match only for 3 characters like such ^.*(\S)(?: ?\1){3}.*$.

Now it detects things like:

however it does not take into account whitespace such as this:

is there a modification that can be done to achieve this?

Upvotes: 6

Views: 2976

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627327

To disallow four or more consecutive chars in the string, you need

^(?!.*(.)\1{3,}).*

See the regex demo. If you do not allow an empty string, replace last .* with .+. Details:

  • ^ - start of string
  • (?!.*(.)\1{3,}) - a negative lookahead that fails the match if there are zero or more chars other than line break chars as many as possible, and then a char captured into Group 1 that is followed with three occurrences of the same char
  • .* - any zero or more chars other than line break chars as many as possible (not necessary if you use a method that does not require a full string match, like [Matcher#find][2] or regex.ContainsMatchIn).

Here is a Kotlin demo (for a change):

import java.util.*
 
fun main(args: Array<String>) {
    val texts = arrayOf<String>("hello how are you...","hiii", "hello how are you.............","hiiiiii")
    val re = """^(?!.*(.)\1{3,}).*""".toRegex()
    for(text in texts) {    
      val isValid = re.containsMatchIn(text)
      println("${text}: ${isValid}")    
    }
}

Output:

hello how are you...: true
hiii: true
hello how are you.............: false
hiiiiii: false

NOTE:

If you do not want to limit to consecutive repeatitions modify the pattern above as follows:

^(?!.*(.)(?:.*?\1){3,}).*

See this regex demo. The (?:.*?\1){3,} regex matches three or more occurrences of any zero or more chars other than line break chars as few as possible and then the Group 1 value.

To match across line breaks, replace . with [\s\S] or add (?s) at the start of the pattern.

To limit the repetitions to letters, replace (.) with ([a-zA-Z]) or (\p{L}), and if you need to only check repeated digits, replace (.) with (\d) or ([0-9]).

Upvotes: 1

Mena
Mena

Reputation: 48434

I think there's a much simpler solution if you're looking for any character repeated more than 3 times:

String[] inputs = {
    "hello how are you...", // -> VALID
    "hello how are you.............", // -> INVALID
    "hiii", // -> VALID
    "hiiiiii" // -> INVALID
};
//                            | group 1 - any character
//                            | | back-reference
//                            | |   | 4+ quantifier including previous instance
//                            | |   |     | dot represents any character, 
//                            | |   |     | including whitespace and line feeds
//                            | |   |     | 
Pattern p = Pattern.compile("(.)\\1{3,}", Pattern.DOTALL);
// iterating test inputs
for (String s: inputs) {
    // matching
    Matcher m = p.matcher(s);
    // 4+ repeated character found
    if (m.find()) {
        System.out.printf(
            "Input '%s' not valid, character '%s' repeated more than 3 times%n", 
            s, 
            m.group(1)
        );
    }
}

Output

Input 'hello how are you............. not valid', character '.' repeated more than 3 times
Input 'hiiiiii' not valid, character 'i' repeated more than 3 times
Input 'hello    how are you' not valid, character ' ' repeated more than 3 times

Upvotes: 3

Related Questions