user1408605
user1408605

Reputation: 45

BASH regex syntax for replacing a sub-string

I'm working in bash and I want to remove a substring from a string, I use grep to detect the string and that works as I want, my if conditions are true, I can test them in other tools and they select exactly the string element I want.

When it comes to removing the element from the string I'm having difficulty.

I want to remove something like ": Series 1", where there could be different numbers including 0 padded, a lower case s or extra spaces.

temp='Testing: This is a test: Series 1'

    echo "A. "$temp
    if echo "$temp" | grep -q -i ":[ ]*[S|s]eries[ ]*[0-9]*" && [ "$temp" != "" ]; then
        title=$temp
        echo "B. "$title
        temp=${title//:[ ]*[S|s]eries[ ]*[0-9]*/ }
        echo "C. "$temp
    fi
    # I trim temp for spaces here
    series_title=${temp// /_}   
    echo "D. "$series_title

The problem I have is that at points C & D

Give me: C. Testing D. Testing_

Upvotes: 0

Views: 70

Answers (1)

Anubis
Anubis

Reputation: 7425

You can perform regex matching from bash alone without using external tools.

It's not clear what your requirement is. But from your code, I guess following will help.

temp='Testing: This is a test: Series 1'

# Following will do a regex match and extract necessary parts
# i.e. extract everything before `:` if the entire pattern is matched
[[ $temp =~ (.*):\ *[Ss]eries\ *[0-9]* ]] || { echo "regex match failed"; exit; }

# now you can use the extracted groups as follows    
echo "${BASH_REMATCH[1]}"    # Output = Testing: This is a test

As mentioned in the comments, if you need to extract parts both before and after the removed section,

temp='Testing: This is a test: Series 1 <keep this>'
[[ $temp =~ (.*):\ *[Ss]eries\ *[0-9]*\ *(.*) ]] || { echo "invalid"; exit; }
echo "${BASH_REMATCH[1]} ${BASH_REMATCH[2]}"  # Output = Testing: This is a test <keep this>

Keep in mind that [0-9]* will match zero lengths too. If you need to force that there need to be at least single digit, use [0-9]+ instead. Same goes for <space here>* (i.e. zero or more spaces) and others.

Upvotes: 2

Related Questions