Reputation: 131
I am currently running the following sed command:
sed 's/P(\(.*\))\\mid(\(.*\))/\\condprob{\1}{\2}/g' myfile.tex
Essentially, I have inherited an oddly formatted tex file, and want to replace everything like this:
P(<foo>)\mid(<bar>)
With this
\condprob{<foo>}{<bar>}
The file I am trying to run sed on contains the following line:
P(\vec{m}_i)\mid(t,h,\alpha) = \prod_{u\in\mathcal{U}} P(\vec{m}_{iu})\mid(t,h,\alpha)
Which I would like to change to this:
\condprob{\vec{m}_i}{t,h,\alpha} = \prod_{u\in\mathcal{U}}\condprob{\vec{m}_{iu}}{t,h,\alpha}
However, sed keeps missing the first \mid and instead gives me this:
\condprob{\vec{m}_i)\mid(t,h,\alpha) = \prod_{u\in\mathcal{U}} P(\vec{m}_{iu}}{t,h,\alpha}
If I add a line break at the = sign it matches everything fine
Can someone please a) help me resolve this, and b) perhaps tell me why it is happening?
Thanks.
Edit: thanks choroba and Sloopjon, you've both answered my why, and Sloopjon's solution is actually exactly what I was needing. choroba: I guess I will have to wait another day to learn perl.
For those that are interested Sloopjon's solution when translated into my problem looks like this (match everything that isn't a closing parenthesis):
sed 's/P(\([^)]*\))\\mid(\([^)]\))/\\condprob{\1}{\2}/g' myfile.tex
Upvotes: 1
Views: 39
Reputation: 383
It looks like you expect P(\(.*\))
to match only P(\vec{m}_i)
, but the *
quantifier is greedy, so it actually matches P(\vec{m}_i)\mid...P(\vec{m}_{iu})
. There are two common fixes for this: use a non-greedy quantifier if your tool supports it, or change the pattern so that it only matches what you expect. For example, if you know that parentheses won't nest in this P()
construct, change .*
to [^)]*
.
Edit: I also suggest that you look for a regex visualizer or debugger when you have a problem like this. For example, pasting your example into debuggex.com makes it clear what's happening.
Upvotes: 2
Reputation: 241848
The problem is the greediness of the *
quantifier. It matches as many times as it can, i.e. it doesn't stop at the first )
.
You can try Perl, that features "non-greedy" (frugal, lazy) *?
:
perl -pe 's/P\((.*?)\)\\mid\((.*?)\)/\\condprob{$1}{$2}/g'
Upvotes: 1