Reputation: 3531
I am parsing one file which has some html tag and changing into latex tag.
cat text
<Text>A <strong>ASDFF</strong> is a <em>cerebrovafdfasfscular</em> condifasdftion caufadfsed fasdfby tfdashe l
ocfsdafalised <span style="text-decoration: underline;">ballooning</span> or difdaslation of an arfdatery in thdfe bfdasrai
n. Smadfsall aasdneurysms may dadisplay fdasno ofadsbvious sdfasigns (<span style="text-decoration: underline;"><em><str
ong>asymptomatic</strong></em></span>) bfdasut lfdsaarger afdasneurysms maydas besda asfdsasociated widfth sdsfudd
sed -e 's|<strong>\(.*\)</strong>|\\textbf{\1}|g' test
cat out
<Text>A \textbf{ASDFF</strong> is a <em>cerebrovafdfasfscular</em> condifasdftion caufadfsed fasdfby tfdashe locfsda
falised <span style="text-decoration: underline;">ballooning</span> or difdaslation of an arfdatery in thdfe bfdasrain. Sma
dfsall aasdneurysms may dadisplay fdasno ofadsbvious sdfasigns (<span style="text-decoration: underline;"><em><strong>
;asymptomatic}</em></span>) bfdasut lfdsaarger afdasneurysms maydas besda asfdsasociated widfth sdsfudd
Expected outputs should be \textbf{ASDFF} while i observe \textbf{ASDFF .........}. How to get expected result?
Regards
Upvotes: 2
Views: 378
Reputation: 2747
You may want to use perl regex instead.
perl -pe 's|<strong>(.*?)</strong>|\\textbf{\1}|g'
Your problem is similar with non-greedy-regex-matching-in-sed. And next time you may want to simplify your case to point out the real problem. For example, don't just paste the raw html code, use this instead:
fooTEXT1barfooTEXT2bar
Update
If you just want the greedy approach, just ignore this.
Upvotes: 1