Robby75
Robby75

Reputation: 3455

Get string between strings in bash

I want to get the string between <sometag param=' and '>

I tried to use the method from Get any string between 2 string and assign a variable in bash to get the "x":

 echo "<sometag param='x'><irrelevant stuff='nonsense'>" | tr "'" _ | sed -n 's/.*<sometag param=_\(.*\)_>.*/\1/p'

The problem (apart from low efficiency because I just cannot manage to escape the apostrophe correctly for sed) is that sed matches the maximum, i.e. the output is:

 x_><irrelevant stuff=_nonsense

but the correct output would be the minimum-match, in this example just "x"

Thanks for your help

Upvotes: 1

Views: 6840

Answers (2)

aktivb
aktivb

Reputation: 2082

You don't have to assemble regexes in those cases, you can just use ' as the field separator

in="<sometag param='x'><irrelevant stuff='nonsense'>"

IFS="'" read x whatiwant y <<< "$in"            # bash
echo "$whatiwant"

awk -F\' '{print $2}' <<< "$in"                 # awk

Upvotes: 0

Steve
Steve

Reputation: 54392

You are probably looking for something like this:

sed -n "s/.*<sometag param='\([^']*\)'>.*/\1/p"

Test:

echo "<sometag param='x'><irrelevant stuff='nonsense'>" | sed -n "s/.*<sometag param='\([^']*\)'>.*/\1/p"

Results:

x

Explanation:

  • Instead of a greedy capture, use a non-greedy capture like: [^']* which means match anything except ' any number of times. To make the pattern stick, this is followed by: '>.
  • You can also use double quotes so that you don't need to escape the single quotes. If you wanted to escape the single quotes, you'd do this:

-

... | sed -n 's/.*<sometag param='\''\([^'\'']*\)'\''>.*/\1/p'

Notice how that the single quotes aren't really escaped. The sed expression is stopped, an escaped single quote is inserted and the sed expression is re-opened. Think of it like a four character escape sequence.


Personally, I'd use GNU grep. It would make for a slightly shorter solution. Run like:

... | grep -oP "(?<=<sometag param=').*?(?='>)"

Test:

echo "<sometag param='x'><irrelevant stuff='nonsense'>" | grep -oP "(?<=<sometag param=').*?(?='>)"

Results:

x

Upvotes: 3

Related Questions