Ian Tait
Ian Tait

Reputation: 607

Extract information from a file name using a bash regular expression

I need a regular expression to match and extract groups for a file name that will have the following format:

<artifactName>-<version>-<classifier>.<extension>

Where:

  1. <artifactName> can have dashes in it
  2. <version> must be of the format X, X.Y, X.X.Y, or X.X.X.Y, where X is any number of digits and Y is an alphanumeric string that can contain underscores
  3. <classifier> must be one of the following formats:
    a. <datestring>b<buildNumber>_<branch>
    b. <branch>
    where <datestring> is a 14 digit number, <buildNumber> is any number of digits, and <branch> is any alphanumeric string that can contain dashes or periods
  4. <extension> can be any alphanumeric string that can contain underscores

So far I have this regular expression, which works in online regex testers, but it fails when tested in a bash script:

^(.+)-((?:[[:digit:]]+\.){0,3}(?:[[:digit:]]+))-((?:([0-9]{14})b([[:digit:]]+)_([^\.]*))|(?:[^\.]*))\.(.+)$

The script I am using looks like this:

FILE_NAME='some-artifact-1.2.3.4-20180911123456b123_branch.ex.ten.sion'
REGEX='^(.+)-((?:[[:digit:]]+\.){0,3}(?:[[:digit:]]+))-((?:([0-9]{14})b([[:digit:]]+)_([^\.]*))|(?:[^\.]*))\.(.+)$'

if [[ "${FILE_NAME}" =~ ${REGEX} ]]
then
    echo "Artifact     = ${BASH_REMATCH[1]}"
    echo "Version      = ${BASH_REMATCH[2]}"
    echo "Classifier   = ${BASH_REMATCH[3]}"
    echo "Build Date   = ${BASH_REMATCH[4]}"
    echo "Build Number = ${BASH_REMATCH[5]}"
    echo "Branch       = ${BASH_REMATCH[6]}"
    echo "Extension    = ${BASH_REMATCH[7]}"
fi

I am assuming the interpreter that bash uses requires a little different syntax, but I cannot figure out how to convert the regular expression that works in the online testers into one that works in bash.

Upvotes: 1

Views: 2447

Answers (1)

glenn jackman
glenn jackman

Reputation: 246827

Using shell parameter expansion: It's a bit verbose, but reliable.

FILE_NAME='some-artifact-1.2.3.4-20180911123456b123_branch.ex.ten.sion'

art_ver=${FILE_NAME%-*}
artifact=${art_ver%-*}
version=${art_ver##*-}

class_ext=${FILE_NAME##*-}
classification=${class_ext%%.*}
extension=${class_ext#*.}

printf "%s\n" "$artifact" "$version" "$classification" "$extension"
some-artifact
1.2.3.4
20180911123456b123_branch
ex.ten.sion

I just read your requirements more carefully: if the branch can contain dots and the extension can contain dots, it is impossible to determine where the branch stops and the extension begins.

Upvotes: 1

Related Questions