Dullroar
Dullroar

Reputation: 162

Pandoc drops "unknown" HTML elements when converting to markdown

Consider the following simple HTML:

<!DOCTYPE html>
<body>
<p>Test
  <object height="355" width="425">
    <param name="movie" value="http://www.youtube.com/v/DKk9rv2hUfA&amp;rel=1">
    <param name="wmode" value="transparent">
    <embed height="355" src="http://www.youtube.com/v/DKk9rv2hUfA&amp;rel=1" type="application/x-shockwave-flash" width="425">
  </object>
</p>
</body>

I want to convert that to markdown, and for the elements that don't have markdown equivalents (object, etc.) to just pass them through as HTML unchanged. However, when I run it through pandoc (v1.13.1) with the following command line:

pandoc --from=html --to=markdown --output=C:\Temp\test.md C:\Temp\test.html

...the only output I get in test.md is:

Test

I am obviously missing some parameter, or is this even possible? I would think it is given that markdown allows semi-arbitrary HTML to be embedded inline.

Note: I have already seen this question and answer, but when I try --parse-raw it simply passes through all the HTML as HTML, which is not what I want.

Upvotes: 1

Views: 739

Answers (1)

mb21
mb21

Reputation: 39488

The --parse-raw parameter is indeed what you're looking for. For example:

$ echo '<h1>foo</h1><p>bar <object>baz</object></p>' | pandoc -f html -t markdown --parse-raw
foo
===

bar <object>baz</object>

However, it seems to choke on the <embed> tag in your example, thus leaving the outer <p> tag in place instead of converting it to markdown. You should probably submit a bug report.

Upvotes: 1

Related Questions