Reputation: 3455
I want to get the string between <sometag param='
and '>
I tried to use the method from Get any string between 2 string and assign a variable in bash to get the "x":
echo "<sometag param='x'><irrelevant stuff='nonsense'>" | tr "'" _ | sed -n 's/.*<sometag param=_\(.*\)_>.*/\1/p'
The problem (apart from low efficiency because I just cannot manage to escape the apostrophe correctly for sed) is that sed matches the maximum, i.e. the output is:
x_><irrelevant stuff=_nonsense
but the correct output would be the minimum-match, in this example just "x"
Thanks for your help
Upvotes: 1
Views: 6840
Reputation: 2082
You don't have to assemble regexes in those cases, you can just use ' as the field separator
in="<sometag param='x'><irrelevant stuff='nonsense'>"
IFS="'" read x whatiwant y <<< "$in" # bash
echo "$whatiwant"
awk -F\' '{print $2}' <<< "$in" # awk
Upvotes: 0
Reputation: 54392
You are probably looking for something like this:
sed -n "s/.*<sometag param='\([^']*\)'>.*/\1/p"
Test:
echo "<sometag param='x'><irrelevant stuff='nonsense'>" | sed -n "s/.*<sometag param='\([^']*\)'>.*/\1/p"
Results:
x
Explanation:
[^']*
which means match anything except '
any number of times. To make the pattern stick, this is followed by: '>
.-
... | sed -n 's/.*<sometag param='\''\([^'\'']*\)'\''>.*/\1/p'
Notice how that the single quotes aren't really escaped. The sed
expression is stopped, an escaped single quote is inserted and the sed
expression is re-opened. Think of it like a four character escape sequence.
Personally, I'd use GNU grep
. It would make for a slightly shorter solution. Run like:
... | grep -oP "(?<=<sometag param=').*?(?='>)"
Test:
echo "<sometag param='x'><irrelevant stuff='nonsense'>" | grep -oP "(?<=<sometag param=').*?(?='>)"
Results:
x
Upvotes: 3