How to trim a field by index from the right of a string in Bash?

Question

I would like to remove "(field 5)" from the following string:

test_string="[field 1 (field 2)] field 3 (field 4) (field 5) (field 6)"

Problems:

Sometimes (field 4) doesn't even exist.
I just want to keep (field 6) at the end of the string no matter what.
Sometimes I don't have any field after at all "field 3", in which case I just keep the string as is, ie. [field 1 (field 2)] field 3

So far the only way to do it way very dirty:

$ first_fields="$(printf "${test_string[@]}" | cut -d'(' -f -2)"

$ echo $first_field
> [field 1 (field 2)] field 3

$ last_field="$(printf "(${test_string##*\(}")"

$ echo "$last_field"
> (field 6)

Problem here:

if I have a variable number of fields, I can't cut -f a hard coded field number value otherwise I'll lose (field 4)
All I need is to keep the very last (field) at the right end of the string, no matter what it is.

Question: how to count fields from the right end of the string? Or am I pushing over the limit of Unix shells' capabilities?

I have tried the following but I always get one field only which is the entire string itself:

IFS="("
for i in "${test_string[@]}";
do
    echo "field is: $i"
done
> [field 1 (field 2)] field 3 (field 4) (field 5) (field 6)

Note: the fields are always between parentheses and contain totally random characters every time (worse, they are foreign languages encoded in unicode).

Fred · Accepted Answer

You can use a regular expression anchored to the end.

#!/bin/bash
test_string="[field 1 (field 2)] field 3 (field 4) (field 5) (field 6)"
rgx_field="[(].*[)]"
rgx_space="[[:space:]]*"
if
  [[ $test_string =~ (.*)$rgx_field$rgx_space($rgx_field)$rgx_space$ ]]
then
  result="${BASH_REMATCH[1]}${BASH_REMATCH[2]}" # Removed
else
  result=$test_string # No match... Buggy data?
fi
echo "$result"

This assumes fields are enclosed in parentheses, just like your sample code.

The key line is this :

[[ $test_string =~ (.*)$rgx_field$rgx_space($rgx_field)$rgx_space$ ]]

The =~ operator tries to match the string on the left with the extended regular expression on the right. The parts of the line that are inside parentheses are instructions to the regex matching engine to "remember" those parts (which are then available in the BASH_REMATCH array). The trailing $ indicates that this regular expression has to match the end of the string, so that it works "backwards" from the last field. The leading fields are all matched by the initial (.*).

How to trim a field by index from the right of a string in Bash?

Answers (2)

Related Questions