pandamonium
pandamonium

Reputation: 131

sed seems to match pattern properly only when newline inserted

I am currently running the following sed command:

sed 's/P(\(.*\))\\mid(\(.*\))/\\condprob{\1}{\2}/g' myfile.tex

Essentially, I have inherited an oddly formatted tex file, and want to replace everything like this:

P(<foo>)\mid(<bar>)

With this

\condprob{<foo>}{<bar>}

The file I am trying to run sed on contains the following line:

P(\vec{m}_i)\mid(t,h,\alpha) = \prod_{u\in\mathcal{U}} P(\vec{m}_{iu})\mid(t,h,\alpha)

Which I would like to change to this:

\condprob{\vec{m}_i}{t,h,\alpha} = \prod_{u\in\mathcal{U}}\condprob{\vec{m}_{iu}}{t,h,\alpha}

However, sed keeps missing the first \mid and instead gives me this:

\condprob{\vec{m}_i)\mid(t,h,\alpha) = \prod_{u\in\mathcal{U}} P(\vec{m}_{iu}}{t,h,\alpha}

If I add a line break at the = sign it matches everything fine

Can someone please a) help me resolve this, and b) perhaps tell me why it is happening?

Thanks.

Edit: thanks choroba and Sloopjon, you've both answered my why, and Sloopjon's solution is actually exactly what I was needing. choroba: I guess I will have to wait another day to learn perl.

For those that are interested Sloopjon's solution when translated into my problem looks like this (match everything that isn't a closing parenthesis):

sed 's/P(\([^)]*\))\\mid(\([^)]\))/\\condprob{\1}{\2}/g' myfile.tex

Upvotes: 1

Views: 39

Answers (2)

SloopJon
SloopJon

Reputation: 383

It looks like you expect P(\(.*\)) to match only P(\vec{m}_i), but the * quantifier is greedy, so it actually matches P(\vec{m}_i)\mid...P(\vec{m}_{iu}). There are two common fixes for this: use a non-greedy quantifier if your tool supports it, or change the pattern so that it only matches what you expect. For example, if you know that parentheses won't nest in this P() construct, change .* to [^)]*.

Edit: I also suggest that you look for a regex visualizer or debugger when you have a problem like this. For example, pasting your example into debuggex.com makes it clear what's happening.

Upvotes: 2

choroba
choroba

Reputation: 241848

The problem is the greediness of the * quantifier. It matches as many times as it can, i.e. it doesn't stop at the first ).

You can try Perl, that features "non-greedy" (frugal, lazy) *?:

perl -pe 's/P\((.*?)\)\\mid\((.*?)\)/\\condprob{$1}{$2}/g' 

Upvotes: 1

Related Questions