user32089
user32089

Reputation: 13

Change <br> tag to <?linebreak?> using tcl tdom

I have an input string in html that needs to be parsed and written to DITA compatible XML.

Input:

<p>Line with following newline<br>Line with two following newlines<br><br>Line with no following newline</p>

Desired Output:

<p>Line with following newline<?linebreak?>Line with two following newlines<?linebreak?><?linebreak?>Line with no following newline</p>

package require tdom

set xml {<p>Line with following newline<br>Line with two following newlines<br><br>Line with no following newline</p>}

puts "Input:"
puts "$xml"

set doc [dom parse -html -keepEmpties $xml]
set root [$doc documentElement]

foreach node [$root getElementsByTagName br] {
    $node delete
    #$node appendXML "<?linebreak?>"

}

puts "Output:"
puts [$doc asXML -indent none]

If I uncomment #$node appendXML "<?linebreak?>", the script fails. I'm new to tdom but not tcl. Or....maybe someone has a different idea on how to preserve linebreaks in XML, specifically DITA.

Upvotes: 1

Views: 105

Answers (1)

Shawn
Shawn

Reputation: 52579

Once you call delete on a tdom node, it no longer exists, so naturally you get an error if you then try to use it after.

One approach: For each br node, create a new processing instruction node, and then replace the br one with it (Which first requires getting the node's parent). Your loop would then look like:

foreach node [$root getElementsByTagName br] {
    set lb [$doc createProcessingInstruction linebreak ""]
    [$node parentNode] replaceChild $lb $node
    # replaceChild moves the old node to the document fragment list;
    # just get rid of it completely since we're not going to reuse it
    $node delete
}

and the modified program prints out

Input:
<p>Line with following newline<br>Line with two following newlines<br><br>Line with no following newline</p>
Output:
<html><p>Line with following newline<?linebreak ?>Line with two following newlines<?linebreak ?><?linebreak ?>Line with no following newline</p></html>

Upvotes: 1

Related Questions