ignacio.saravia
ignacio.saravia

Reputation: 327

Put Header with Python - docx

I am using Python-docx to create and write a Word document.

How i can put a text in document header using python-docx?

http://image.prntscr.com/image/8757b4e6d6f545a5ab6a08a161e4c55e.png

Thanks

Upvotes: 1

Views: 16654

Answers (7)

grantr
grantr

Reputation: 1060

For those of you looking to set custom headers w/docx:

I had to use a couple packages to get this to work. My use case was this: I was generating multiple templates and then merging them together, however when I merged them with docx the header from the master file (below) was applied to all sections and all sections were marked as linkedToPrevious = True despite being =False in the original files. However, docx does a really nice job appending files and having it come out error-free on the other end, so I decided to find a way to make it work. Code for reference:

master = Document(files[0])
composer = Composer(master)
footnotes_doc = Document('chapters/footnotes.docx')
for file in files[1:]:
    mergeDoc = Document(file)
    composer.append(mergeDoc)
composer.append(footnotes_doc)
composer.save("chapters/combined.docx")

So now I have a master doc (combined.docx) with all the proper sections however the headers need to be adjusted. You can't iterate over the document with docx, get the current section that you are in, and adjust it or set the headers linking to false. If you set to False you wipe the header completely. You can explicitly call the section and adjust it, but since everything after it is linked to previous, you change the rest of the document from that point. So I pulled in win32com:

Gets the number of sections and then iterates thru them backwards using win32com. This way as you remove linkedToPrevious, you preserve the header in place.

def getSections(document):
    sectionArray = {}
    sections = document.sections
    x = 1
    for section in sections:
        sectionArray[x] = section
        x += 1
    return sectionArray

start_doc = Document('chapters/combined.docx')
listArray = getSections(start_doc) #gets an array of all sections
keylist = list(reversed(sorted(listArray.keys()))) ##now reverse it

word = win32com.client.gencache.EnsureDispatch("Word.Application")
word = client.DispatchEx("Word.Application")
word.Visible = False
#tell word to open the document
word.Documents.Open(' C:\path to\combined.docx')
#open it internally
doc = word.Documents(1)

try:
    for item in keylist:
        word.ActiveDocument.Sections(item).Headers(win32com.client.constants.wdHeaderFooterPrimary).LinkToPrevious=False
        word.ActiveDocument.Sections(item).Headers(win32com.client.constants.wdHeaderFooterEvenPages).LinkToPrevious=False
    word.ActiveDocument.SaveAs("c:\wherever\combined_1.docx")
    doc.Close()
    word.Quit()
except:
    doc.Close()
    word.Quit()

ok so now the doc is primed to edit the headers, which we can do with docx easily and worry free now. First we need to parse the XML, which I use docx to access then feed to lxml, to get the location of the section needed:

xml = str(start_doc._element.xml) #this gets the full XML using docx
tree = etree.fromstring(xml)

WORD_NAMESPACE='{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
TEXT = WORD_NAMESPACE + 't'
PARA = WORD_NAMESPACE + 'p'
SECT = WORD_NAMESPACE + 'sectPr'
sectionLoc = []
for item in tree.iter(PARA):
    for node in item.iter(TEXT):
        if 'Section' in node.text: #this is how I am identifying which headers I need to edit
            print(node.text)
            sectionLoc.append(node.text)    
            
    for sect in item.iter(SECT):
        print(sect)
        sectionLoc.append('section')
        # print(etree.tostring(sect))

counter =0
sectionLocs = []
for index, item in enumerate(sectionLoc): #just some logic to get the correct section number from the xml parse
    if 'Section' in item:
        sectionLocs.append(counter)
        continue
    counter += 1

#ok now use those locations with docx to adjust the headers
#remember that start_doc here needs to be the new result from win32 process- 
#so start_doc = Document('C:\path to\combined.docx') in this case
for item in sectionLocs: 
    section = start_doc.sections[item]
    header = section.header
    para_new = header.paragraphs[0]
    para_new.text = 'TEST!'

start_doc.save('willthiswork.docx')

This is a lot of work. I bet there is a way to do it entirely with win32com but I couldn't figure out how to get the relevant sections with it based on the content in the body of the page. The "sectPr" tags always come at the end of the page, so in combing the document for text I know is on the page that needs a new header, "Section," I know that the next printed out section is the one I want to edit so I just get it's location in the list.

I think this whole workflow is a hack but it works and I hope the sample code helps someone.

Upvotes: 2

scanny
scanny

Reputation: 29031

UPDATE: This feature has been implemented since the time of this answer.

As other respondents have noted below, the Section object provides access to its header objects.

header = document.sections[0].header

Note that a section can have up to three headers (first_page, odd_pages, even_pages) and each section can have its own set of headers. The most common situation is that a document has a single header and a single section.

A header is like a document body or table cell in that it can contain tables and/or paragraphs and by default has a single empty paragraph (it cannot contain zero paragraphs).

header.paragraphs[0].text = "My header text"

This is explained in greater detail on this page in the documentation::
https://python-docx.readthedocs.io/en/latest/user/hdrftr.html

Unfortunately this feature is not implemented yet. The page @SamRogers linked to is part of the enhancement proposal (aka. "analysis page"). The implementation is in progress however, by @eupharis, so might be available in a month or so. The ongoing pull request is here if you want to follow it. https://github.com/python-openxml/python-docx/pull/291

Upvotes: 6

black_specs88
black_specs88

Reputation: 1

    import docx

    document = docx.Document()

    header_section = document.sections[0]
    header = header_section.header
    header_text = header.paragraphs[0]
    header_text.text = "Header of document"

You can use \t either side of text to align it in the centre

Upvotes: 0

Siddhartha Mukherjee
Siddhartha Mukherjee

Reputation: 2961

I've been using it to work

header = document.sections[0].header
header.add_paragraph('Test Header')

Header is a subclass of BlockItemContainer, from which it inherits the same content editing capabilities as Document, such as .add_paragraph().

Upvotes: 3

Connor James
Connor James

Reputation: 71

This feature has been implemented. See: https://python-docx.readthedocs.io/en/latest/dev/analysis/features/header.html

You can add text to the header of a word document using python-docx as follows:

header = document.sections[0].header
head = header.paragraphs[0]
head.text = 'Add Your Text'

Upvotes: 4

Anton vBR
Anton vBR

Reputation: 18914

(With respect that this question is old...)

I have used a work-around in my project where my "client" wanted different headers in different pages by:

  1. Creating a document using python-docx and section breaks

  2. Execute a word macro file (*.xlsm) with two arguments: (1) fileName = path, docTitle = title of the document to be inserted in footer.

The macro file will open the newly created document and add headers and footers that are already inside the macro file. This would need to be modified if the header and footer text need to vary.

Pyton code:

wd = win32com.client.Dispatch("Word.Application")
wd.Visible = False
doc = wd.Documents.Open(pathToDOCM) # path here
wd.Run("Main.RunMain",fileName, docTitle) # 2 args
doc.Close()
del wd

VBA code:

VBA (inside *.xlsm) code:

Sub RunInside()
    Call RunMain("C:\Users\???\dokument.docx", "test")
End Sub

Sub RunMain(wordDocument As String, wordTitle As String)
    ' Create Headers
    Call CreateHeaders(wordDocument, wordTitle)
End Sub

Sub CreateHeaders(wordDocument As String, wordTitle As String)

    Dim i As Integer
    Dim outputName As String

    Dim aDoc As Document
    Dim oApp As Word.Application
    Dim oSec As Word.Section
    Dim oDoc As Word.Document

    Dim hdr1, hdr2 As HeaderFooter
    Dim ftr1, ftr2 As HeaderFooter

    'Create a new document in Word
    Set oApp = New Word.Application
    'Set oDoc = oApp.Documents.Add
    Set oDoc = oApp.Documents.Open(wordDocument)
    'Set aDoc as active document
    Set aDoc = ActiveDocument


    oDoc.BuiltInDocumentProperties("Title") = wordTitle


    For i = 1 To 9:
        Set hdr1 = aDoc.Sections(i).Headers(wdHeaderFooterPrimary)
        Set hdr2 = oDoc.Sections(i).Headers(wdHeaderFooterPrimary)

        Set ftr1 = aDoc.Sections(i).Footers(wdHeaderFooterPrimary)
        Set ftr2 = oDoc.Sections(i).Footers(wdHeaderFooterPrimary)


        If i > 1 Then
            With oDoc.Sections(i).Headers(wdHeaderFooterPrimary)
             .LinkToPrevious = False
            End With

            With oDoc.Sections(i).Footers(wdHeaderFooterPrimary)
             .LinkToPrevious = False
            End With
        End If

        hdr1.Range.Copy
        hdr2.Range.Paste

        ftr1.Range.Copy
        ftr2.Range.Paste

    Next i



    outputName = Left(wordDocument, Len(wordDocument) - 5)
    outputName = outputName + ".pdf"

    oDoc.SaveAs outputName, 17

    oDoc.Close SaveChanges:=wdSaveChanges

    Set oDoc = Nothing
    Set aDoc = Nothing

End Sub

Final remark: The code loops through different sections and copy-paste the header and footers. It also saves the document to *.PDF.

Upvotes: 1

Sam
Sam

Reputation: 17

You can use header.text like

header = section.header
header.text = 'foobar'

see http://python-docx.readthedocs.io/en/latest/dev/analysis/features/header.html?highlight=header for more information

Upvotes: 0

Related Questions