Reputation: 23
I apologize beforehand for the jumbled mess that is the title but that's the shortest way I could think to describe what I'm trying to do.
I'm reading a file that has multiple lines of text that I'm looping through and I'm trying to use regex to get a substring from each line. These lines will start with the word "name: " then have some series of letters and possibly hyphens. After that, there may be a '#' followed by digits, or a '-' followed by digits, or a new line. I only want to capture the letters and possible hyphens. Below is what I've tried with input, output, and intended output. This regex is being run in linux bash script
regex |
---|
name: (.[^\#\r\n\d]*) |
input |
---|
name: foo-bar#2.3.2 |
name: bar-foo-4.2 |
name: foobar |
name: far-far |
captured outputs |
---|
foo-bar |
bar-foo- |
foobar |
far-far |
Intended outputs |
---|
foo-bar |
bar-foo |
foobar |
far-far |
Code sample:
fileRegex="name: (.[^\\#\r\n\d]*)"
for i in "${fileList[@]}"
do
if [[$i =~ $fileRegex ]]; then
fixedLine="${BASH_REMATCH[1]}
echo "$fixedLine"
fi
done
From the table, the offending instance is "name: bar-foo-4.2" which should only output "bar-foo" but instead is outputting "bar-foo-". What I'm trying to figure out is how to stop capturing when there is a "-" followed by any digits, but also to maintain the outputs of all the other examples.
Upvotes: 1
Views: 37
Reputation: 785276
In bash
you may try this code:
declare -a arr=([0]="name: foo-bar#2.3.2" [1]="name: bar-foo-4.2" [2]="name: foobar" [3]="name: far-far")
fileRegex='name: ([[:alpha:]]+(-[[:alpha:]]+)*)'
for s in "${arr[@]}"; do
[[ $s =~ $fileRegex ]] && echo "${BASH_REMATCH[1]}"
done
Output:
foo-bar
bar-foo
foobar
far-far
RegEx Explained:
name:
: Match "name: "
(
: First capture group start
[[:alpha:]]+
: Match 1+ alphabets)
: First capture group endUpvotes: 2