Tudor Timi
Tudor Timi

Reputation: 7573

bash regex to parse text of the form +incdir+<dir1>+<dir2>

I have an input string of the form +incdir+<dir1>+<dir2>, where <dir1> and <dir2> are directory names. I want to parse this using a bash regex and have the values of the directories inside BASH_REMATCH[1], [2], ...

Here is what I tried:

function match {
  if [[ "$1" =~ \+incdir(\+.*)+ ]]; then
    for i in $(seq $(expr ${#BASH_REMATCH[@]} - 1)); do
      echo $i ":" ${BASH_REMATCH[$i]}
    done
  else
    echo "no match"
  fi
}

This works for match +incdir+foo, but doesn't for match +incdir+foo+bar, because it does greedy matching and it outputs +foo+bar. There isn't any non-greedy matching in bash as regex in bash expression mentions so I tried the following for the pattern: \+incdir(\+[^+]*)+ but this just gives me +bar.

The way I would interpret the regex is the following: find the beginning +incdir, then match me at least one group starting with a + followed by as many characters as you can find that are not +. When you hit a + this is the start of the next group. I guess my reasoning is incorrect.

Does anyone have any idea what I'm doing wrong?

Upvotes: 0

Views: 212

Answers (1)

Charles Duffy
Charles Duffy

Reputation: 295650

Using only bash builtins (but NOT regular expressions, which are the wrong tool for this job):

match() {
    [[ $1 = *+incdir+* ]] || return              # noop if no +incdir present
    IFS=+ read -r -a pieces <<<"${1#*+incdir+}"  # read everything after +incdir+
                                                 # into +-separated array
    for idx in "${!pieces[@]}"; do               # iterate over keys in array
      echo "$idx: ${pieces[$idx]}"               # ...and emit key/value pairs
    done
}

$ match "yadda yadda +incdir+foo+bar+baz"
0: foo
1: bar
2: baz

Upvotes: 2

Related Questions