fry
fry

Reputation: 538

HTML tags in Rmarkdown to word document

Is there any possibility to use HTML tags in Rmarkdown documents rendered to word?

For example:

---
output: word_document
---

# This is rendered as heading

<h1> But this is not </h1>

Works perfectly when rendering as html_document, but not when rendering as a word_document.

A more specific question about tags has been asked here, but without solution: Underline in RMarkdown to Microsoft Word

Upvotes: 2

Views: 1428

Answers (1)

tarleb
tarleb

Reputation: 22659

Sure, here we go:

---
output:
  word_document:
    md_extensions: +raw_html-markdown_in_html_blocks
    pandoc_args: ['--lua-filter', 'read_html.lua']
---

# This is rendered as heading

<h1> And this is one, too </h1>

where read_html.lua must be a file in the same directory with this content:

function RawBlock (raw)
  if raw.format:match 'html' and not FORMAT:match 'html' then
    return pandoc.read(raw.text, raw.format).blocks
  end
end

Let's unpack the above to see how it works. The first thing you'll notice are the additional parameters to word_document. The md_extensions modify the way that pandoc parses the text, see here for a full list (or run pandoc --list-extensions=markdown) in your terminal. We enable raw_html to make sure that pandoc does not discard raw HTML tags, and disable markdown_in_html_blocks as to ensure that we get the whole HTML tag as one block in pandoc's internal format.

The next setting is pandoc_args, where we tell pandoc to use a Lua filter to modify the document during conversion. The filter picks out all HTML blocks, parses them as HTML instead of Markdown, and replaces the raw HTML with the parsing result.

So if you are using raw HTML that pandoc can read, you'll be fine. If you are using special instructions which pandoc cannot read, then the setup described above won't help either. You'd have to rewrite the markup in OOXML, the XML format used in docx.

Upvotes: 3

Related Questions