Programmer_nltk
Programmer_nltk

Reputation: 915

Word document to python-docx

Objective: I use a word template to which I want to pass paragraph values from python.

Pipeline: The pipeline involves using python-docx and sending output paragraph to python docx;thus, creating a docx file.

from docx import Document
from docx.shared import Inches
document = Document()
document.add_heading('Document Title', 0)
r  = """sample paragraph"""
p = document.add_paragraph(r)
document.add_page_break()
document.save('Test.docx')

Question:

I already got a sample template that I want to use, is it possible to create a blue print of the template using python-docx and continue with the content blocking?

By blue print, I mean the section header, footer, margin, spacing to name a few, must be preserved or coded in python-docx format automatically. So that I can send sample paragraphs to the relevant section.

If I need to create a template using another one as base; I believe, I need to hard code the section, margin and style again in python-docx. Is there a way to circumnavigate this work path?

Upvotes: 0

Views: 2451

Answers (2)

Martin Strnad
Martin Strnad

Reputation: 36

I have been using placeholders for this (ie, putting something like "[Counterparty]" there), then searching and replacing like this:

from docx import Document

document = Document(('./templates/yourfilename.docx'))

for paragraph in document.paragraphs:
    if "[Counterparty]" in paragraph.text:  
        paragraph.text = re.sub("[Counterparty]", "Replacing text", paragraph.text)

Upvotes: 1

scanny
scanny

Reputation: 28863

I think the approach you'll find most useful is to define paragraph styles in your template document that embody the paragraph and character formatting for the different types of paragraphs you want (heading, body paragraph, etc.), and then apply the correct style to each paragraph as you add it.

http://python-docx.readthedocs.io/en/latest/user/styles-understanding.html
http://python-docx.readthedocs.io/en/latest/user/styles-using.html

You'll still need to write the document from "top" to "bottom". If the items don't arrive in sequence you'll probably want to keep them organized in a memory data structure until you have them all and then write the document from that data structure.

There are ways around this, but there's no notion of a "cursor" in python-docx (yet) where you can insert a paragraph at an arbitrary location.

Upvotes: 1

Related Questions