David Tonhofer
David Tonhofer

Reputation: 15316

Bash regex match using '=~' operator unexpectedly fails

Here is a little unit script for the good old bash regex match called by =~

#!/bin/bash

# From "man bash"
# An additional binary operator, =~, is available, with the same
# precedence as == and !=. When it is used, the string to the right of
# the operator is considered an extended regular expression  and  matched
# accordingly (as  in regex(3)).  The return value is 0 if the string
# matches the pattern, and 1 otherwise.  If the regular expression
# is syntactically incorrect, the conditional expression's return value
# is 2.

# The above should say regex(7) of course

match() {
   local REGEX=$1
   local VAL=$2
   [[ $VAL =~ $REGEX  ]]
   RES=$?
   case $RES in
      0) echo "Match of '$VAL' against '$REGEX': MATCH" >&2 ;;
      1) echo "Match of '$VAL' against '$REGEX': NOMATCH" >&2 ;;
      2) echo "Error in regex expression '$REGEX'" >&2 ;;
      *) echo "Unknown returnvalue $RES" >&2 ;;
   esac
   echo $RES
}

v() {
   SHALL=$1
   IS=$2
   if [ "$SHALL" -eq "$IS" ]; then echo "OK"; else echo "NOT OK"; fi
}

unit_test() {
   v 0 "$(match A                A  )"
   v 0 "$(match A.               AB )"
   v 0 "$(match A[:digit:]?      A  )"
   v 0 "$(match A[:digit:]       A6 )"
   v 0 "$(match \"A[:digit:]*\"  A6 )"  # enclosing in quotes needed otherwise fileglob happens
   v 0 "$(match A[:digit:]+      A6 )"
   v 0 "$(match A                BA )"
   v 1 "$(match ^A               BA )"
   v 0 "$(match ^A               Ab )"
   v 0 "$(match 'A$'             BA )"
   v 1 "$(match 'A$'             Ab )"
}

unit_test

Looks pretty straightforward but running this yields:

Match of 'A' against 'A': MATCH
OK
Match of 'AB' against 'A.': MATCH
OK
Match of 'A' against 'A[:digit:]?': MATCH
OK
Match of 'A6' against 'A[:digit:]': NOMATCH
NOT OK
Match of 'A6' against 'A[:digit:]*': MATCH
OK
Match of 'A6' against 'A[:digit:]+': NOMATCH
NOT OK
Match of 'BA' against 'A': MATCH
OK
Match of 'BA' against '^A': NOMATCH
OK
Match of 'Ab' against '^A': MATCH
OK
Match of 'BA' against 'A$': MATCH
OK
Match of 'Ab' against 'A$': NOMATCH
OK

One would expect

Match of 'A6' against 'A[:digit:]'

and

Match of 'A6' against 'A[:digit:]+'

to succeed.

What am I doing wrong?

Upvotes: 4

Views: 953

Answers (3)

hek2mgl
hek2mgl

Reputation: 157947

You are using the [:digit:] in the wrong contexts. These character class are meant to be used inside a bracket expression, like [[:digit:][:alnum:]._+-] (for example).

It should be:

if [[ "A6" =~ A[[:digit:]] ]] ; then
    echo "match"
fi

Upvotes: 2

David Tonhofer
David Tonhofer

Reputation: 15316

As suggested in the comments, the ShellCheck tool shows what the problem is:

The output of ShellCheck

Upvotes: 1

Inian
Inian

Reputation: 85560

Remember to enclose the character classes within brackets [], to match them as a list of characters i.e. as [[:digit:]]

string="A6"
[[ $string =~ A[[:digit:]] ]]
echo $?
0

Check more on Bracket-Expressions.

Upvotes: 3

Related Questions