MOHAMED
MOHAMED

Reputation: 43616

How to replace a pattern in a string in bash

I have the following strings stored in variables in shell

a1="aaa,bbb3,12,ccc,"
a2="aaa,2,bbb,ccc,"

I want to execute a common command on the above string variables to remove the object number from the path. an object number is a number between 2 commas.

So the result should be

a1="aaa,bbb3,ccc,"
a2="aaa,bbb,ccc,"

How I can do that?

I tried:

echo ${a1//,[0-9],/,} ==> wrong result
echo ${a2//,[0-9],/,} ==> good result

Also I have the following strings stored in variables in shell

a1="aaa,bbb3,#zu_45,ccc,"
a2="aaa,#mn,bbb,ccc,"
a3="aaa,bbb,ccc,#kn,"

And I want to execute a common command on the above string variables to remove the object which start with # from the path. an object located between 2 commas.

So the result should be

a1="aaa,bbb3,ccc,"
a2="aaa,bbb,ccc,"
a3="aaa,bbb,ccc,"

How I can do that?

Upvotes: 2

Views: 2686

Answers (5)

UrsaDK
UrsaDK

Reputation: 865

A bash only solution:

... remove the object number from the path ...

$ a1="aaa,bbb3,12,ccc,"
$ echo ${a1//,+([0-9]),/,}
aaa,bbb3,ccc,

... remove the object which start with # from the path ...

$ a1="aaa,bbb3,#zu_45,ccc,"
$ echo ${a1//,#*([^,]),/,}
aaa,bbb3,ccc,

The above two solutions can be combined into a single operation:

$ a1="aaa,32,bbb3,#zu_45,ccc,"
$ echo ${a1//,@(#*([^,])|+([0-9])),/,}
aaa,bbb3,ccc,

All of these solutions rely on shell option extglob being set. They use bash extended patterns to match parts of the string:

  • +([0-9]) - Matches one or more digits;
  • #*([^,]) - Matches any string that starts with a hash [#] and is followed by any number of not-commas;
  • @(...|...) - Matches exactly one of the given patterns, which are separated by the pipe-symbol.

Upvotes: 0

gniourf_gniourf
gniourf_gniourf

Reputation: 46903

Not exactly a one-liner, but a bullet-proof method:

IFS=, read -d '' -ra ary <<< "$variable"
unset ary[${#ary[@]}-1]

will create an array ary, the fields of which are the fields in your variable variable (fields are comma-separated). Then, if you want to remove all the numbers:

for i in "${!ary[@]}"; do
    [[ ${ary[i]} = +([[:digit:]]) ]] && unset ary[$i]
done

or to remove all the elements that start with a hash:

for i in "${!ary[@]}"; do
    [[ ${ary[i]} = \#* ]] && unset ary[$i]
done

Finally, to put that back into your variable:

printf -v variable '%s,' "${ary[@]}"

A function that does that:

remove_number() {
    local ary i
    IFS=, read -d '' -ra ary <<< "${!1}"
    unset ary[${#ary[@]}-1]
    for i in "${!ary[@]}"; do
        [[ ${ary[i]} = +([[:digit:]]) ]] && unset ary[$i]
    done
    printf -v "$1" '%s,' "${ary[@]}"
}

Demo (assuming this function is defined in session):

$ a1="aaa,bbb3,12,ccc,"
$ remove_number a1
$ echo "$a1"
aaa,bbb3,ccc,

This also works if there are newlines in variable (notice: it doesn't remove the negative number... that was not really clear from your requirements, but it's rather trivial to modify the function to also handle this, see is_number below):

$ a=$'aaa,\n\nnewline here\n,123456,-1234,abc\n,'
$ echo "$a"
aaa,

newline here
,123456,-1234,abc
,
$ remove_number a
$ echo "$a"
aaa,

newline here
,-1234,abc
,

Oh, and it works if the number appears in the first field (unlike some other answers):

$ a='1234,abc,'
$ b='1234,'
$ remove_number a
$ remove_number b
$ echo "a='$a'"; echo "b='$b'"
a='abc,'
b=','

Also fine for "empty" lists:

$ a=','
$ remove_number a
$ echo "$a"
,

A more functional approach: make a function that removes a field if a condition, given by a function, is met:

remove_cond() {
    # $1 is condition
    # $2 is name of variable that contains list
    local ary i
    IFS=, read -d '' -ra ary <<< "${!2}"
    unset ary[${#ary[@]}-1]
    for i in "${!ary[@]}"; do
        "$1" "${ary[i]}" && unset ary[$i]
    done
    printf -v "$2" '%s,' "${ary[@]}"
}

Then a couple of conditions:

is_number() {
    [[ $1 = ?(-)+([[:digit:]]) ]]
}
is_bang() {
    [[ $1 = \#* ]]
}
is_bang_or_number() {
    is_number "$1" || is_bang "$1"
}

With these set in current session:

$ a="aaa,bbb3,#zu_45,ccc,"
$ remove_cond is_bang a
$ echo "$a"
aaa,bbb3,ccc,
$ a='1234,#lol,keep me please,-1234,'
$ remove_cond is_bang_or_number a
$ echo "$a"
keep me please,

Upvotes: 0

ghoti
ghoti

Reputation: 46896

For your first question, you can use extglob for this.

$ a1="aaa,bbb3,12,ccc,"
$ a2="aaa,2,bbb,ccc,"
$ shopt -s extglob
$ echo "${a1//,+([0-9]),/,}"
aaa,bbb3,ccc,
$ echo "${a2//,+([0-9]),/,}"
aaa,bbb,ccc,
$

It's well documented in Greg's wiki and many other places.


For your second question, you can handle this either with more advanced pattern matching, or by treating the comma-separated elements as fields and processing them individually.

First, a pattern match solution.

$ a1="aaa,bbb3,#zu_45,ccc,"
$ a2="aaa,#mn,bbb,ccc,"
$ a3="aaa,bbb,ccc,#kn,"
$ echo "${a1//,#+([^,]),/,}"
aaa,bbb3,ccc,
$ echo "${a2//,#+([^,]),/,}"
aaa,bbb,ccc,
$ echo "${a3//,#+([^,]),/,}"
aaa,bbb,ccc,
$

(I'll just point out for any wandering readers that while patterns like this may look like regular expressions, they are not.)

Second, a solution that treats fields like fields would of course involve a loop. Here's an example as a one-liner:

$ a=(${a1//,/ }); unset newa; for i in "${a[@]}"; do if [[ ! $i =~ ^# ]]; then newa="$newa${newa:+,}$i"; fi; done; echo "$newa"
aaa,bbb3,ccc

You can repeat this for the other variables.

Note that this depends on your data not having any spaces in it, since that's the character used to separate fields in a bash array.

Upvotes: 2

Tom Fenech
Tom Fenech

Reputation: 74705

Using pure bash, you can do:

$ re='(.*),[0-9]+,(.*)'
$ a1="aaa,bbb3,12,ccc,"
$ a2="aaa,2,bbb,ccc,"
$ [[ $a1 =~ $re ]] && echo ${BASH_REMATCH[1]},${BASH_REMATCH[2]}
aaa,bbb3,ccc,
$ [[ $a2 =~ $re ]] && echo ${BASH_REMATCH[1]},${BASH_REMATCH[2]}
aaa,bbb,ccc,

Those echos could be turned into assignments, giving you what you wanted.

Rather than doing a search and replace, you can just capture the parts before and after using ( ) and they will be stored in the special array variable $BASH_REMATCH.

For the second one, you could do something similar:

$ re='(.*),#[^,]*,(.*)'
$ a1="aaa,bbb3,#zu_45,ccc,"
$ a2="aaa,#mn,bbb,ccc,"
$ a3="aaa,bbb,ccc,#kn,"
$ [[ $a1 =~ $re ]] && echo ${BASH_REMATCH[1]},${BASH_REMATCH[2]}
aaa,bbb3,ccc,
$ [[ $a2 =~ $re ]] && echo ${BASH_REMATCH[1]},${BASH_REMATCH[2]}
aaa,bbb,ccc,
$ [[ $a3 =~ $re ]] && echo ${BASH_REMATCH[1]},${BASH_REMATCH[2]}
aaa,bbb,ccc,

Using a [^,] (not a comma) character class ensures that the rest of the string isn't consumed.

update

You could turn this into a function like this:

$ rm_hash () { 
>     local re='(.*),#[^,]*,(.*)'
>     local subject="$1"
>     [[ $subject =~ $re ]] && echo ${BASH_REMATCH[1]},${BASH_REMATCH[2]}
> }
$ a1="aaa,bbb3,#zu_45,ccc,"
$ rm_hash $a1
aaa,bbb3,ccc,

Upvotes: 1

anubhava
anubhava

Reputation: 786291

You can use this sed:

sed -r 's/,(#.*)?[0-9]+//' file

a1="aaa,bbb3,ccc,"
a2="aaa,bbb,ccc,"
a3="aaa,bbb,ccc,"

Upvotes: 3

Related Questions