Reputation: 327
I am using Python-docx to create and write a Word document.
How i can put a text in document header using python-docx?
http://image.prntscr.com/image/8757b4e6d6f545a5ab6a08a161e4c55e.png
Thanks
Upvotes: 1
Views: 16654
Reputation: 1060
For those of you looking to set custom headers w/docx:
I had to use a couple packages to get this to work. My use case was this: I was generating multiple templates and then merging them together, however when I merged them with docx the header from the master file (below) was applied to all sections and all sections were marked as linkedToPrevious = True despite being =False in the original files. However, docx does a really nice job appending files and having it come out error-free on the other end, so I decided to find a way to make it work. Code for reference:
master = Document(files[0])
composer = Composer(master)
footnotes_doc = Document('chapters/footnotes.docx')
for file in files[1:]:
mergeDoc = Document(file)
composer.append(mergeDoc)
composer.append(footnotes_doc)
composer.save("chapters/combined.docx")
So now I have a master doc (combined.docx) with all the proper sections however the headers need to be adjusted. You can't iterate over the document with docx, get the current section that you are in, and adjust it or set the headers linking to false. If you set to False you wipe the header completely. You can explicitly call the section and adjust it, but since everything after it is linked to previous, you change the rest of the document from that point. So I pulled in win32com:
Gets the number of sections and then iterates thru them backwards using win32com. This way as you remove linkedToPrevious, you preserve the header in place.
def getSections(document):
sectionArray = {}
sections = document.sections
x = 1
for section in sections:
sectionArray[x] = section
x += 1
return sectionArray
start_doc = Document('chapters/combined.docx')
listArray = getSections(start_doc) #gets an array of all sections
keylist = list(reversed(sorted(listArray.keys()))) ##now reverse it
word = win32com.client.gencache.EnsureDispatch("Word.Application")
word = client.DispatchEx("Word.Application")
word.Visible = False
#tell word to open the document
word.Documents.Open(' C:\path to\combined.docx')
#open it internally
doc = word.Documents(1)
try:
for item in keylist:
word.ActiveDocument.Sections(item).Headers(win32com.client.constants.wdHeaderFooterPrimary).LinkToPrevious=False
word.ActiveDocument.Sections(item).Headers(win32com.client.constants.wdHeaderFooterEvenPages).LinkToPrevious=False
word.ActiveDocument.SaveAs("c:\wherever\combined_1.docx")
doc.Close()
word.Quit()
except:
doc.Close()
word.Quit()
ok so now the doc is primed to edit the headers, which we can do with docx easily and worry free now. First we need to parse the XML, which I use docx to access then feed to lxml, to get the location of the section needed:
xml = str(start_doc._element.xml) #this gets the full XML using docx
tree = etree.fromstring(xml)
WORD_NAMESPACE='{http://schemas.openxmlformats.org/wordprocessingml/2006/main}'
TEXT = WORD_NAMESPACE + 't'
PARA = WORD_NAMESPACE + 'p'
SECT = WORD_NAMESPACE + 'sectPr'
sectionLoc = []
for item in tree.iter(PARA):
for node in item.iter(TEXT):
if 'Section' in node.text: #this is how I am identifying which headers I need to edit
print(node.text)
sectionLoc.append(node.text)
for sect in item.iter(SECT):
print(sect)
sectionLoc.append('section')
# print(etree.tostring(sect))
counter =0
sectionLocs = []
for index, item in enumerate(sectionLoc): #just some logic to get the correct section number from the xml parse
if 'Section' in item:
sectionLocs.append(counter)
continue
counter += 1
#ok now use those locations with docx to adjust the headers
#remember that start_doc here needs to be the new result from win32 process-
#so start_doc = Document('C:\path to\combined.docx') in this case
for item in sectionLocs:
section = start_doc.sections[item]
header = section.header
para_new = header.paragraphs[0]
para_new.text = 'TEST!'
start_doc.save('willthiswork.docx')
This is a lot of work. I bet there is a way to do it entirely with win32com but I couldn't figure out how to get the relevant sections with it based on the content in the body of the page. The "sectPr" tags always come at the end of the page, so in combing the document for text I know is on the page that needs a new header, "Section," I know that the next printed out section is the one I want to edit so I just get it's location in the list.
I think this whole workflow is a hack but it works and I hope the sample code helps someone.
Upvotes: 2
Reputation: 29031
UPDATE: This feature has been implemented since the time of this answer.
As other respondents have noted below, the Section
object provides access to its header objects.
header = document.sections[0].header
Note that a section can have up to three headers (first_page, odd_pages, even_pages) and each section can have its own set of headers. The most common situation is that a document has a single header and a single section.
A header is like a document body or table cell in that it can contain tables and/or paragraphs and by default has a single empty paragraph (it cannot contain zero paragraphs).
header.paragraphs[0].text = "My header text"
This is explained in greater detail on this page in the documentation::
https://python-docx.readthedocs.io/en/latest/user/hdrftr.html
Unfortunately this feature is not implemented yet. The page @SamRogers linked to is part of the enhancement proposal (aka. "analysis page"). The implementation is in progress however, by @eupharis, so might be available in a month or so. The ongoing pull request is here if you want to follow it. https://github.com/python-openxml/python-docx/pull/291
Upvotes: 6
Reputation: 1
import docx
document = docx.Document()
header_section = document.sections[0]
header = header_section.header
header_text = header.paragraphs[0]
header_text.text = "Header of document"
You can use \t
either side of text to align it in the centre
Upvotes: 0
Reputation: 2961
I've been using it to work
header = document.sections[0].header
header.add_paragraph('Test Header')
Header is a subclass of BlockItemContainer, from which it inherits the same content editing capabilities as Document, such as .add_paragraph().
Upvotes: 3
Reputation: 71
This feature has been implemented. See: https://python-docx.readthedocs.io/en/latest/dev/analysis/features/header.html
You can add text to the header of a word document using python-docx as follows:
header = document.sections[0].header
head = header.paragraphs[0]
head.text = 'Add Your Text'
Upvotes: 4
Reputation: 18914
(With respect that this question is old...)
I have used a work-around in my project where my "client" wanted different headers in different pages by:
Creating a document using python-docx and section breaks
Execute a word macro file (*.xlsm) with two arguments: (1) fileName = path, docTitle = title of the document to be inserted in footer.
The macro file will open the newly created document and add headers and footers that are already inside the macro file. This would need to be modified if the header and footer text need to vary.
Pyton code:
wd = win32com.client.Dispatch("Word.Application")
wd.Visible = False
doc = wd.Documents.Open(pathToDOCM) # path here
wd.Run("Main.RunMain",fileName, docTitle) # 2 args
doc.Close()
del wd
VBA code:
VBA (inside *.xlsm) code:
Sub RunInside()
Call RunMain("C:\Users\???\dokument.docx", "test")
End Sub
Sub RunMain(wordDocument As String, wordTitle As String)
' Create Headers
Call CreateHeaders(wordDocument, wordTitle)
End Sub
Sub CreateHeaders(wordDocument As String, wordTitle As String)
Dim i As Integer
Dim outputName As String
Dim aDoc As Document
Dim oApp As Word.Application
Dim oSec As Word.Section
Dim oDoc As Word.Document
Dim hdr1, hdr2 As HeaderFooter
Dim ftr1, ftr2 As HeaderFooter
'Create a new document in Word
Set oApp = New Word.Application
'Set oDoc = oApp.Documents.Add
Set oDoc = oApp.Documents.Open(wordDocument)
'Set aDoc as active document
Set aDoc = ActiveDocument
oDoc.BuiltInDocumentProperties("Title") = wordTitle
For i = 1 To 9:
Set hdr1 = aDoc.Sections(i).Headers(wdHeaderFooterPrimary)
Set hdr2 = oDoc.Sections(i).Headers(wdHeaderFooterPrimary)
Set ftr1 = aDoc.Sections(i).Footers(wdHeaderFooterPrimary)
Set ftr2 = oDoc.Sections(i).Footers(wdHeaderFooterPrimary)
If i > 1 Then
With oDoc.Sections(i).Headers(wdHeaderFooterPrimary)
.LinkToPrevious = False
End With
With oDoc.Sections(i).Footers(wdHeaderFooterPrimary)
.LinkToPrevious = False
End With
End If
hdr1.Range.Copy
hdr2.Range.Paste
ftr1.Range.Copy
ftr2.Range.Paste
Next i
outputName = Left(wordDocument, Len(wordDocument) - 5)
outputName = outputName + ".pdf"
oDoc.SaveAs outputName, 17
oDoc.Close SaveChanges:=wdSaveChanges
Set oDoc = Nothing
Set aDoc = Nothing
End Sub
Final remark: The code loops through different sections and copy-paste the header and footers. It also saves the document to *.PDF.
Upvotes: 1
Reputation: 17
You can use header.text like
header = section.header
header.text = 'foobar'
see http://python-docx.readthedocs.io/en/latest/dev/analysis/features/header.html?highlight=header for more information
Upvotes: 0