fabee
fabee

Reputation: 545

multiple line tag content replacement if content matches

I am not very proficient in perl, awk, or sed and I have been searching the web for a solution to my problem for some while now, but wasn't very successful.

I would like to replace

<math> ... </math>

with

<math>\begin{align} ... \end{align}</math>

if ... contains \\. My problem is that the string between the <math> tags can span multiple lines. I managed to replace the tags within one line with sed but couldn't get it to run for multiple lines.

Any simple solution with perl, awk, or sed is very welcome. Thanks a lot.

Upvotes: 0

Views: 154

Answers (3)

potong
potong

Reputation: 58473

This might work for you (GNU sed):

sed ':a;$!{N;ba}
/[\x00\x01\x02]/q1
s/<math>/\x00/g
s/<\/math>/\x01/g
s/\\\\/\x02/g
s/\x00\([^\x01\x02]*\)\x01/<math>\1<\/math>/g
s/\x00/<math>\\begin{align}/g
s/\x01/\\end{align}<\/math>/g
s/\x02/\\\\/g' file

Upvotes: 0

lynxlynxlynx
lynxlynxlynx

Reputation: 1433

Use separate expressions for each tag and the script will be immune to multilinedness:

sed -e 's,<math>,&\\begin{align},g' -e 's,</math>,&\\end{align},g' 

Edit: Multiline awk version:

awk '/<math>/,/<\/math>/ {
  if (index($0, "<math>")) { 
    a=$0
  } else {
    b = b $0
  }
  if (index($0, "</math>")) {
    if (index(b,"\\\\")) {
      sub("<math>","&\\begin{align}", a)
      sub("</math>","\\end{align}&", b)
    }; 
    print a,b
    a=""
    b=""
  } 
}'

Upvotes: 1

Birei
Birei

Reputation: 36272

Try next perl command. How it works? It reads content file in slurp mode saving it in $f variable and later add with a regexp in single mode (match newlines with .) \begin{regex} and \end{regex} if found \\ between math tags.

perl -e '
    do { 
        $/ = undef; 
        $f = <> 
    }; 
    $f =~ s#(<math>)(.*\\\\.*)(</math>)#$1\\begin{align}$2\\end{align}$3#s; 
    printf qq|%s|, $f
' infile

Upvotes: 0

Related Questions