user1706022
user1706022

Reputation: 13

sed regex not being greedy?

In bash I have a string variable tempvar, which is created thus:

tempvar=`grep -n 'Mesh Tally' ${meshtalfile}`

meshtalfile is a (large) input file which contains some header lines and a number of blocks of data lines, each marked by a beginning line which is searched for in the grep above.

In the case at hand, the variable tempvar contains the following string:

5: Mesh Tally Number 4 977236: Mesh Tally Number 14 1954467: Mesh Tally Number 24 4354479: Mesh Tally Number 34

I now wish to extract the line number relating to a particularly mesh tally number - so I define a variable meshnum1 as equal to 24, and run the following sed command:

echo ${tempvar} | sed -r "s/^.*([0-9][0-9]*):\sMesh\sTally\sNumber\s${meshnum1}.*$/\1/"

This is where things go wrong. I expect the output 1954467, but instead I get 7. Trying with number 34 instead returns 9 instead of 4354479. It seems that sed is returning only the last digit of the number - which surely violates the principle of greedy matching? And oddly, when I move the open parenthesis ( left a couple of characters to include .*, it returns the whole line up to and including the single character it was previously returning. Surely it cannot be greedy in one situation and antigreedy in another? Hopefully I have just done something stupid with the syntax...

Upvotes: 1

Views: 1190

Answers (3)

Steve
Steve

Reputation: 54592

There's no need for sed, here's one way using GNU grep:

echo "$tempvar" | grep -oP "[0-9]+(?=:\sMesh\sTally\sNumber\s${meshnum1}\b)"

Upvotes: 1

The problem is that the .* is being greedy too, which means that it will get all numbers too. Since you force it to get at least one digit in the [0-9][0-9]* part, the .* before it will be greedy enough to leave only one digit for the expression after it.

A solution could be:

echo ${tempvar} | sed -r "s/^.*\s([0-9][0-9]*):\sMesh\sTally\sNumber\s${meshnum1}.*$/\1/"

Where now the \s between the .* and the [0-9][0-9]* explictly forces there to be a space before the digits you want to match.

Hope this helps =)

Upvotes: 2

Darian Lewin
Darian Lewin

Reputation: 182

Are the values in $tempvar supposed to be multiple or a single line? Because if it is a single line, ".*$" should match to the end of line, meaning all the other values too, right?

Upvotes: 1

Related Questions