Regex match with capturing group excluding specific characters and looking for the last occurrence based on later characters

Question

I apologize beforehand for the jumbled mess that is the title but that's the shortest way I could think to describe what I'm trying to do.

I'm reading a file that has multiple lines of text that I'm looping through and I'm trying to use regex to get a substring from each line. These lines will start with the word "name: " then have some series of letters and possibly hyphens. After that, there may be a '#' followed by digits, or a '-' followed by digits, or a new line. I only want to capture the letters and possible hyphens. Below is what I've tried with input, output, and intended output. This regex is being run in linux bash script

regex
name: (.[^\# \d]*)

input
name: foo-bar#2.3.2
name: bar-foo-4.2
name: foobar
name: far-far

captured outputs
foo-bar
bar-foo-
foobar
far-far

Intended outputs
foo-bar
bar-foo
foobar
far-far

Code sample:

fileRegex="name: (.[^\#
\d]*)"
for i in "${fileList[@]}"
do
    if [[$i =~ $fileRegex ]]; then
        fixedLine="${BASH_REMATCH[1]}
        echo "$fixedLine"
    fi
done

From the table, the offending instance is "name: bar-foo-4.2" which should only output "bar-foo" but instead is outputting "bar-foo-". What I'm trying to figure out is how to stop capturing when there is a "-" followed by any digits, but also to maintain the outputs of all the other examples.

anubhava · Accepted Answer

In bash you may try this code:

declare -a arr=([0]="name: foo-bar#2.3.2" [1]="name: bar-foo-4.2" [2]="name: foobar" [3]="name: far-far")
fileRegex='name: ([[:alpha:]]+(-[[:alpha:]]+)*)'
for s in "${arr[@]}"; do
   [[ $s =~ $fileRegex ]] && echo "${BASH_REMATCH[1]}"
done

Output:

foo-bar
bar-foo
foobar
far-far

RegEx Explained:

name: : Match "name: "
(: First capture group start
- [[:alpha:]]+: Match 1+ alphabets
- (-[[:alpha:]]+)*`: Separated with hyphens match 0 or more 1+ alpha character substring
): First capture group end

Regex match with capturing group excluding specific characters and looking for the last occurrence based on later characters

Answers (1)

Related Questions