Ladih
Ladih

Reputation: 811

How to match repeated characters using regular expression operator =~ in bash?

I want to know if a string has repeated letter 6 times or more, using the =~ operator.

a="aaaaaaazxc2"
if [[ $a =~ ([a-z])\1{5,} ]];
then
     echo "repeated characters"
fi

The code above does not work.

Upvotes: 4

Views: 6228

Answers (3)

Dima Chubarov
Dima Chubarov

Reputation: 17169

You do not need the full power of backreferences for this specific case of one character repeats. You could just build the regex that would check for a repeat of every single lower case letter

regex="a{6}"
for x in {b..z} ; do regex="$regex|$x{6}" ; done    
if [[ "$a" =~ ($regex) ]] ; then echo "repeated characters" ; fi

The regex built with the above for loop looks like

> echo "$regex" | fold -w60
a{6}|b{6}|c{6}|d{6}|e{6}|f{6}|g{6}|h{6}|i{6}|j{6}|k{6}|l{6}|
m{6}|n{6}|o{6}|p{6}|q{6}|r{6}|s{6}|t{6}|u{6}|v{6}|w{6}|x{6}|
y{6}|z{6}

This regular expression behaves as you would expect

> if [[ "abcdefghijkl" =~ ($regex) ]] ; then \
  echo "repeated characters" ; else echo "no repeat detected" ; fi
no repeat detected
> if [[ "aabbbbbbbbbcc" =~ ($regex) ]] ; then \
  echo "repeated characters" ; else echo "no repeat detected" ; fi
repeated characters

Updated following the comment from @sln replaced bound {6,} expression with a simple {6}.

Upvotes: 2

anubhava
anubhava

Reputation: 785176

BASH regex flavor i.e. ERE doesn't support backreference in regex. ksh93 and zsh support it though.

As an alternate solution, you can do it using extended regex option in grep:

a="aaaaaaazxc2"
grep -qE '([a-zA-Z])\1{5}' <<< "$a" && echo "repeated characters"

repeated characters

EDIT: Some ERE implementations support backreference as an extension. For example Ubuntu 14.04 supports it. See snippet below:

$> echo $BASH_VERSION
4.3.11(1)-release

$> a="aaaaaaazxc2"
$> re='([a-z])\1{5}'
$> [[ $a =~ $re ]] && echo "repeated characters"
repeated characters

Upvotes: 4

Charles Duffy
Charles Duffy

Reputation: 295443

[[ $var =~ $regex ]] parses a regular expression in POSIX ERE syntax.

See the POSIX regex standard, emphasis added:

BACKREF - Applicable only to basic regular expressions. The character string consisting of a character followed by a single-digit numeral, '1' to '9'.

Backreferences are not formally specified by the POSIX standard for ERE; thus, they are not guaranteed to be available (subject to platform-specific libc extensions) in bash's native regex syntax, thus mandating the use of external tools (awk, grep, etc).

Upvotes: 2

Related Questions