Raghav Arora
Raghav Arora

Reputation: 186

How to write code to MS Word using python retaining the formatting?

I want to create a MS word document which compiles a lot of my existing codes ( in MATLAB and Python). I was writing it using python-docx.

If I do something like:

file = open('task1.m', 'r')
document.add_paragraph(file)

Then the code gets written in MS word in simple text format without the formatting.

Is there some way I can write the code while retaining the programming language formatting? (Keeping the colors intact)

Upvotes: 1

Views: 3602

Answers (2)

Brian.Hoard
Brian.Hoard

Reputation: 108

For a quick and dirty way to achieve this, NotePad++ has a feature where you can turn on syntax highlighting for your language. Then, select the code, right-click and select "Plugin Commands > Copy Text with Syntax Highlighting". Now, you can paste that into Word and the colors remain.

Upvotes: 2

pho
pho

Reputation: 25489

The .m file doesn't contain color information. That's added by whatever IDE / editor you're using.

If you know (or can find out) how to insert html-formatted or rtf-formatted text into your word document, check out the pygments module.

I'm not sure how you can write this rtf-formatted text into a word document. However, if you write it out to an RTF document, this can be opened and saved by Word.

So let's say I run this code toword.py:

from docx import Document

from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import RtfFormatter

with open("toword.py", "r") as f:
    code = f.read()
    
ht = highlight(code, PythonLexer(), RtfFormatter())

with open("rtffile.rtf", "w") as wf:
    wf.write(ht)
    
doc = Document()
paragraph = doc.add_paragraph(ht)
doc.save("code.docx")

There's also a pygments.lexers.matlab.MatlabLexer form Matlab files. Or you could use pygments.lexers.get_lexer_for_filename(filename) to get a lexer from the filename.

Opening rtffile.rtf in Word:

enter image description here

Opening code.docx in Word:

enter image description here


Alternatively, you can use the pandoc module along with its backend. It can convert to the docx format if you supply it some markdown, and can automatically highlight based if the markdown contains code fences.

So with this code:

# from docx import Document

from pygments import highlight
from pygments.lexers import PythonLexer
from pygments.formatters import HtmlFormatter

import pandoc

with open("toword.py", "r") as f:
    code = f.read()

md = f"`````python\n{code}\n`````";
doc = pandoc.Document()
doc.markdown = bytearray(md, encoding="utf-8")
doc.add_argument("out=code.docx")
doc.docx

we get the following code.docx:

enter image description here

You can play with the highlight style using the --highlight-style=... argument. More info here

Upvotes: 3

Related Questions