Reputation: 86729

Regular expression to match any character being repeated more than 10 times

I'm looking for a simple regular expression to match the same character being repeated more than 10 or so times. So for example, if I have a document littered with horizontal lines:

=================================================

It will match the line of = characters because it is repeated more than 10 times. Note that I'd like this to work for any character.

Upvotes: 172

Answers (8)

user181548

Reputation:

The regex you need is /(.)\1{9,}/.

Test:

#!perl
use warnings;
use strict;
my $regex = qr/(.)\1{9,}/;
print "NO" if "abcdefghijklmno" =~ $regex;
print "YES" if "------------------------" =~ $regex;
print "YES" if "========================" =~ $regex;

Here the \1 is called a backreference. It references what is captured by the dot . between the brackets (.) and then the {9,} asks for nine or more of the same character. Thus this matches ten or more of any single character.

Although the above test script is in Perl, this is very standard regex syntax and should work in any language. In some variants you might need to use more backslashes, e.g. Emacs would make you write \(.\)\1\{9,\} here.

If a whole string should consist of 10 or more identical characters, add anchors around the pattern:

my $regex = qr/^(.)\1{9,}$/;

Upvotes: 238

js2010

Reputation: 27443

A slightly more generic powershell example. In powershell 7, the match is highlighted including the last space (can you highlight in stack?).

'a b c d e f ' | select-string '([a-f] ){6,}'

a b c d e f

Upvotes: 0

LihO

Reputation: 42083

PHP's preg_replace example:

$str = "motttherbb fffaaattther";
$str = preg_replace("/([a-z])\\1/", "", $str);
echo $str;

Here [a-z] hits the character, () then allows it to be used with \\1 backreference which tries to match another same character (note this is targetting 2 consecutive characters already), thus:

mother father

If you did:

$str = preg_replace("/([a-z])\\1{2}/", "", $str);

that would be erasing 3 consecutive repeated characters, outputting:

moherbb her

Upvotes: 1

E.V.I.L.

Reputation: 2166

You can also use PowerShell to quickly replace words or character reptitions. PowerShell is for Windows. Current version is 3.0.

$oldfile = "$env:windir\WindowsUpdate.log"

$newfile = "$env:temp\newfile.txt"
$text = (Get-Content -Path $oldfile -ReadCount 0) -join "`n"

$text -replace '/(.)\1{9,}/', ' ' | Set-Content -Path $newfile

Upvotes: 1

Michał Niklas

Reputation: 54302

In Python you can use (.)\1{9,}

(.) makes group from one char (any char)
\1{9,} matches nine or more characters from 1st group

example:

txt = """1. aaaaaaaaaaaaaaa
2. bb
3. cccccccccccccccccccc
4. dd
5. eeeeeeeeeeee"""
rx = re.compile(r'(.)\1{9,}')
lines = txt.split('\n')
for line in lines:
    rxx = rx.search(line)
    if rxx:
        print line

Output:

1. aaaaaaaaaaaaaaa
3. cccccccccccccccccccc
5. eeeeeeeeeeee

Upvotes: 48

jeekl

Reputation: 362

. matches any character. Used in conjunction with the curly braces already mentioned:

$: cat > test
========
============================
oo
ooooooooooooooooooooooo


$: grep -E '(.)\1{10}' test
============================
ooooooooooooooooooooooo

Upvotes: 7

dalloliogm

Reputation: 8940

use the {10,} operator:

$: cat > testre
============================
==
==============

$: grep -E '={10,}' testre
============================
==============

Upvotes: 0

SilentGhost

Reputation: 319601

={10,}

matches = that is repeated 10 or more times.

Upvotes: 2

Regular expression to match any character being repeated more than 10 times

Answers (8)

Related Questions