Reputation: 17
I would like to match and fetch many strings using regex groups using bash script, Linux.
I was successful if I do small matching groups with sed command. However, if I have a larger number of matching groups, it does not work properly.
This is my code:
txt="toknA: ABCDEFGGDSSSE toknB 1500 SEABCDEFGGDSSSEABCDEFGGDSSSE 1235643 CDEFGGDSSSE toknC 64 ABCDEFGGDSSSE ABCDEFGGDSSSE toknD 1000 ABCDEFGGDSSSE toknE 14306 toknF 16402238 toknG 0 toknH 0 toknI 0 toknJ 0 toknK 4930 toknL 333494 toknM fdvd swsw"
echo $txt | sed -r 's/^(toknA).*(toknB \d+).*(toknC \d+).*(toknD \d+).*(toknE \d+).*(toknF).*(toknG).*(toknH).*(toknI).*(toknJ).*(toknK).*(toknL)/\1 \2 \3 \4 \5 \6 \7 \8 \9 \10 \11 \12/'
This is what I have got:
toknA: ABCDEFGGDSSSE toknB 1500 SEABCDEFGGDSSSEABCDEFGGDSSSE 1235643 CDEFGGDSSSE toknC 64 ABCDEFGGDSSSE ABCDEFGGDSSSE toknD 1000 ABCDEFGGDSSSE toknE 14306 toknF 16402238 toknG 0 toknH 0 toknI 0 toknJ 0 toknK 4930 toknL 333494 toknM fdvd swsw
What I expected to get is:
toknA toknB 1500 toknC 64 toknD 1000 toknE 14306 toknF toknG toknH toknI toknJ toknK toknL
Any ideas why is that happening? can be solved in another way?
Upvotes: 1
Views: 60
Reputation: 203324
With GNU awk for the 3rd arg to match():
$ echo "$txt" | awk '
match($0,/^(toknA).*(toknB [0-9]+).*(toknC [0-9]+).*(toknD [0-9]+).*(toknE [0-9]+).*(toknF).*(toknG).*(toknH).*(toknI).*(toknJ).*(toknK).*(toknL)/,a) {
for (i=1; i in a; i++) {
printf "%s%s", (i>1? OFS : ""), a[i]
}
print ""
}'
toknA toknB 1500 toknC 64 toknD 1000 toknE 14306 toknF toknG toknH toknI toknJ toknK toknL
Upvotes: 0
Reputation: 246774
With just bash regex matching [[ a =~ b ]]
-- captured pieces are stored in the BASH_REMATCH
array
regex='(toknA)'
for x in {B..E}; do regex+=".*(tokn${x}[[:blank:]]+[[:digit:]]+)"; done
for x in {F..L}; do regex+=".*(tokn${x})"; done
if [[ $txt =~ $regex ]]; then
for i in "${!BASH_REMATCH[@]}"; do
printf "%d\t%q\n" $i "${BASH_REMATCH[i]}"
done
echo
result=${BASH_REMATCH[*]:1} # join into a single string
echo "$result"
fi
outputs
0 toknA:\ ABCDEFGGDSSSE\ \ toknB\ 1500\ \ \ \ \ \ \ \ SEABCDEFGGDSSSEABCDEFGGDSSSE\ \ 1235643\ CDEFGGDSSSE\ \ \ \ \ \ \ toknC\ 64\ \ ABCDEFGGDSSSE\ \ \ \ \ \ \ \ ABCDEFGGDSSSE\ \ toknD\ 1000\ \ ABCDEFGGDSSSE\ \ \ \ \ \ \ \ toknE\ 14306\ \ toknF\ 16402238\ \ \ \ \ \ \ \ toknG\ 0\ \ toknH\ 0\ \ toknI\ 0\ \ toknJ\ 0\ \ \ \ \ \ \ \ toknK\ 4930\ \ toknL
1 toknA
2 toknB\ 1500
3 toknC\ 64
4 toknD\ 1000
5 toknE\ 14306
6 toknF
7 toknG
8 toknH
9 toknI
10 toknJ
11 toknK
12 toknL
toknA toknB 1500 toknC 64 toknD 1000 toknE 14306 toknF toknG toknH toknI toknJ toknK toknL
Upvotes: 1