Tremors
Tremors

Reputation: 133

Regular Expression to follow a specific pattern

I'm trying to make sure the input to my shell script follows the format Name_Major_Minor.extension

where Name is any number of digits/characters/"-" followed by "_"

Major is any number of digits followed by "_"

Minor is any number of digits followed by "."

and Extension is any number of characters followed by the end of the file name.

I'm fairly certain my regular expression is just messed up slightly. any file I currently run through it evaluates to "yes" but if I add "[A-Z]$" instead of "*$" it always evaluates to "no". Regular expressions confuse the hell out of me as you can probably tell..

if echo $1 | egrep -q [A-Z0-9-]+_[0-9]+_[0-9]+\.*$
then
    echo "yes"
else
    echo "nope"
    exit
fi

edit: realized I am missing the pattern for "minor". Still doesn't work after adding it though.

Upvotes: 2

Views: 1630

Answers (2)

Ruslan Osmanov
Ruslan Osmanov

Reputation: 21492

Use =~ operator

Bash supports regular expression matching through its =~ operator, and there is no need for egrep in this particular case:

if [[ "$1" =~ ^[A-Za-z0-9-]+_[0-9]+_[0-9]+\..*$ ]]

Errors in your regular expression

The \.*$ sequence in your regular expression means "zero or more dots". You probably meant "a dot and some characters after it", i.e. \..*$.

Your regular expression matches only the end of the string ($). You likely want to match the whole string. To match the entire string, use the ^ anchor to match the beginning of the line.

Escape the command line arguments

If you still want to use egrep, you should escape its arguments as you should escape any command line arguments to avoid reinterpretation of special characters, or rather wrap the argument in single, or double quotes, e.g.:

if echo "$1" | egrep -q '^[A-Za-z0-9-]+_[0-9]+_[0-9]+\..*$'

Use printf instead of echo

Don't use echo, as its behavior is considered unreliable. Use printf instead:

printf '%s\n' "$1"

Upvotes: 4

Nicolas
Nicolas

Reputation: 7081

Try this regex instead: ^[A-Za-z0-9-]+(?:_[0-9]+){2}\..+$.

  • [A-Za-z0-9-]+ matches Name
  • _[0-9]+ matches _ followed by one or more digits
  • (?:...){2} matches the group two times: _Major_Minor
  • \..+ matches a period followed by one or more character

The problem in your regex seems to be at the end with \.*, which matches a period \. any number of times, see here. Also the [A-Z0-9-] will only match uppercase letters, might not be what you wanted.

Upvotes: 1

Related Questions