Hernn0
Hernn0

Reputation: 99

How to trim a field by index from the right of a string in Bash?

I would like to remove "(field 5)" from the following string:

test_string="[field 1 (field 2)] field 3 (field 4) (field 5) (field 6)"

Problems:

So far the only way to do it way very dirty:

$ first_fields="$(printf "${test_string[@]}" | cut -d'(' -f -2)"

$ echo $first_field
> [field 1 (field 2)] field 3

$ last_field="$(printf "(${test_string##*\(}")"

$ echo "$last_field"
> (field 6)

Problem here:

Question: how to count fields from the right end of the string? Or am I pushing over the limit of Unix shells' capabilities?

I have tried the following but I always get one field only which is the entire string itself:

IFS="("
for i in "${test_string[@]}";
do
    echo "field is: $i"
done
> [field 1 (field 2)] field 3 (field 4) (field 5) (field 6)

Note: the fields are always between parentheses and contain totally random characters every time (worse, they are foreign languages encoded in unicode).

Upvotes: 1

Views: 151

Answers (2)

anubhava
anubhava

Reputation: 785058

You can use a sed:

$> test_string="[field 1 (field 2)] field 3 (field 4) (field 5) (field 6)"
$> sed -E 's/^(.*)\([^)]*\) (\([^)]*\))$/\1\2/' <<< "$test_string"
[field 1 (field 2)] field 3 (field 4) (field 6)

$> test_string="[field 1 (field 2)] field 3 (field 5) (field 6)"
$> sed -E 's/^(.*)\([^)]*\) (\([^)]*\))$/\1\2/' <<< "$test_string"
[field 1 (field 2)] field 3 (field 6)

This sed command uses a regex to remove (last -1)th (...) value from input.

Upvotes: 0

Fred
Fred

Reputation: 6995

You can use a regular expression anchored to the end.

#!/bin/bash
test_string="[field 1 (field 2)] field 3 (field 4) (field 5) (field 6)"
rgx_field="[(].*[)]"
rgx_space="[[:space:]]*"
if
  [[ $test_string =~ (.*)$rgx_field$rgx_space($rgx_field)$rgx_space$ ]]
then
  result="${BASH_REMATCH[1]}${BASH_REMATCH[2]}" # Removed
else
  result=$test_string # No match... Buggy data?
fi
echo "$result"

This assumes fields are enclosed in parentheses, just like your sample code.

The key line is this :

[[ $test_string =~ (.*)$rgx_field$rgx_space($rgx_field)$rgx_space$ ]]

The =~ operator tries to match the string on the left with the extended regular expression on the right. The parts of the line that are inside parentheses are instructions to the regex matching engine to "remember" those parts (which are then available in the BASH_REMATCH array). The trailing $ indicates that this regular expression has to match the end of the string, so that it works "backwards" from the last field. The leading fields are all matched by the initial (.*).

Upvotes: 1

Related Questions