oldhomemovie
oldhomemovie

Reputation: 15139

Carrying style IDs/names from HTML to .docx?

Is it possible to somehow tell pandoc to carry the names of styles from original HTML to .docx?

I understand that in order to tune the actual styles, I should be using reference.docx file generated by pandoc. However, reference.docx is limited to what styles it has to: headings, body text, block text, etc.

I'd like to:

  1. specify "myStyle" style in the input HTML (via a "class" attribute, via any other HTML attribute or even via a filter code written in Lua),

    <html>
      <body>
        <p>Hello</p>
        <p class="myStyle">World!</p>
      </body>
    </html>
    
  2. add a custom "myStyle" to reference.docx using Word,

  3. run a html->docx conversion an expect pandoc generate a paragraph element with "myStyle" (instead of BodyText, which I believe it sets by default), so the end result looks like this (contents of word/document.xml inside the resulting output.docx was cut for brevity):

    <w:p>
      <w:pPr>
        <w:pStyle w:val="BodyText" />
      </w:pPr>
      <w:r>
        <w:txml:space="preserve">Hello</w:t>
      </w:r>
    </w:p>
    <w:p>
      <w:pPr>
        <w:pStyle w:val="myStyle" />
      </w:pPr>
      <w:r>
        <w:txml:space="preserve">World!</w:t>
      </w:r>
    </w:p>
    

There's some evidence styleId can be passed around, but I don't really understand it and am unable to find any documentation about it.

Doc on filtering in Lua states you can access attrs when manipulating a pandoc.div, but it says nothing about whether any of the attrs will be interpreted by pandoc in any meaningful way.

Upvotes: 4

Views: 820

Answers (1)

oldhomemovie
oldhomemovie

Reputation: 15139

Finally, found what I needed – Custom styles. It's limited, but better than what I arrived earlier, and of course much better than nothing at all :)

I'll leave a step-by-step guide here in case anyone stumbles upon a similar question.

First, generate a reference.docx file like this:

pandoc --print-default-data-file reference.docx > styles.docx

Then open the file in MS Word (I was using a macOS version) you'll see this:

enter image description here

Click the "New style..." button on the right, and create a style to your liking. In my case I made change the style of text to be bold, in blue color:

enter image description here

Since I am converting from HTML to DOCX, here's my input.html:

<html>
  <body>
    <div>Page 1</div>
    <div custom-style="eugene-is-testing">Page 2</div>
    <div>Page 3</div>
  </body>
</html>

Run:

pandoc --standalone --reference-doc styles.docx --output output.docx input.html

Finally, enjoy the result:

enter image description here

Upvotes: 5

Related Questions