Daniel
Daniel

Reputation: 641

Convert LaTeX to XML elisp

I'm new to Elisp and I need to convert a piece of LaTeX code to XML.

LaTeX:

\tag[read=true]{Please help}\tag[notread=false]{Please help II}

XML:

<tag read='true'> Please help </tag>
<tag notread='false'> please help </tag>

I wrote some regex to search and find \tag but now I need to somehow read read and notread and assign them as attributes and then read their value after "=". The regex I have tried:

[..] (while (re-search-forward "\\\\\\<tag\\>\\[" nil t) [..]

Upvotes: 1

Views: 182

Answers (1)

tripleee
tripleee

Reputation: 189487

This is not a full solution, but hopefully demonstrates how to use backreferences with regular expressions.

Briefly, every group you create with \\(...\\) in the regex is captured, and can be recalled with (match-string N), where N is the sequential number of the group, starting from 1 for the leftmost opening parenthesis, and proceeding so that each opening parenthesis gets a number one higher than the previous.

(So if you have alternations, some backreferences will be undefined. If you apply the regex "\\(foo\\)\\|\\(bar\\)" to the string "bar", (match-string 1) will be empty, and (match-string 2) will be "bar".)

(while
    (re-search-forward
     "\\\\\\<\\(tag\\)\\>\\[\\([^][=]*\\)=\\([^][]*\\)\\]{\\([^}]*\\)}"
     nil t)
  (insert (concat "<" (match-string 1) " "
          (match-string 2) "='" (match-string 3) "'>"
          (match-string 4)
          "</" (match-string 1) ">\n") ) )

That regex certainly is a monster; you might want to decompose and document it somewhat.

(defconst latex-to-xml-regex
  (concat "\\\\"                ; literal backslash
          "\\<"                 ; word boundary (not really necessary)
          "\\(tag\\)"           ; group 1: capture tag
          "\\["                 ; literal open square bracket
          "\\("                 ; group 2: attribute name
            "[^][=]*"             ; attribute name regex
          "\\)"                 ; group 2 end
          "="                   ; literal
          "\\("                 ; group 3: attribute value
            "[^][]*"              ; attribute value regex
          "\\)"                 ; group 3 end
          "\\]"                 ; literal close square bracket
          "{"                   ; begin text group
          "\\("                 ; group 4: text
            "[^}]*"               ; text regex
          "\\)"                 ; group 4 end
          "}"                   ; end text group
          ) "Regex for `latex-to-xml` (assuming your function is called that)")

Upvotes: 1

Related Questions