Reputation: 21
I'm pretty new to bash scripting and regexp and have a question.
I want to check to see if my variable $name
starts with a-d, e-h, i-l etc and do some stuff accordingly. If the string starts with "the." or "The." it should check the first letter after the period.
My problem is that if $name
consists of "the.anchor" both the a-d0-9 and q-t will be true. Do you guys have any idea what's wrong?
if [[ $name =~ ^([tT]he\.)?[a-dA-D0-9]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[e-hE-H]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[i-lI-L]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[m-pM-P]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[q-tQ-T]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[u-wU-W]+ ]]; then
do some stuff
fi
if [[ $name =~ ^([tT]he\.)?[x-zX-Z]+ ]]; then
do some stuff
fi
Thanks in advance!
Upvotes: 2
Views: 295
Reputation: 3646
I think the ?
can be removed as the if
statement is already doing the test. The +
matches the preceding item at least once and would only be needed if you want to match more than one instance of the letters.
You can do it like this:
if [[ $name =~ ^[tT]he\.[a-dA-D0-9] ]]; then
do some stuff
fi
The condition will only return true if the first character after ^[tT]he\.
is [a-dA-D0-9]
.
However, I tend to think case
is a cleaner solution than if
statements when matching lists of characters against variables.
case $name in
[tT]he\.[a-dA-D0-9]*)
do some stuff
;;
esac
Upvotes: 0
Reputation: 21
I figured out a way to fix my problem by using elif statements and putting the q-t part as the last one
Upvotes: 0
Reputation: 61198
Your first part it optional:
([tT]he\.)?
So the.anchor
matches the pattern ^([tT]he\.)?[a-dA-D0-9]+
because the the.
matches `^([tT]he\.)?
and the a
matches [a-dA-D0-9]+
. It matches ^([tT]he\.)?[q-tQ-T]+
because ^([tT]he\.)?
is optional an t
matches [q-tQ-T]+
. Note not the whole input is consumed by the second pattern, in fact only the first character is grabbed.
You can verify this by having bash echo the match:
echo "${BASH_REMATCH[0]}"
Which should print the.anchor
in the first case and t
in the second.
You do not have an end anchor on the pattern so only part of the input needs to be matched. If you made the second pattern ^([tT]he\.)?[q-tQ-T]+$
then it would not match.
Alternatively you could make the the first part possessive - ^([tT]he\.)?+
. This will mean that if the engine matches the first expression it will not be unmatched. In the latter case ^([tT]he\.)?+
will grab the the.
and then not release it when [q-tQ-T]+
fails; this will cause the match to fail.
Upvotes: 2