Reputation: 13
Hello I am currently trying to write a code that combines docx files. These files may have text, images, tables, or equations. The code aims to copy these objects and append them to a base docx. I am able to copy and merge text, images, and tables by using the docx module's 'add_picture' and 'add_paragraph' methods but I cannot do this for word equations. I decided to try to dig into the xml of the docx and copy the equation section from there. I am able to append equations to my base document but when I continue to append pictures, texts, and tables, these equations show up at the end of the docx. My questions are: why does this occur if I loop through the appended objects in the order I want them to appear and is there is a way to keep the code from putting the equations at the end of the docx.
Here is some overview of the code:
create base document:
document=Document('basedoc.docx')
For each block item of the sub-doc I categorize the type, style, and whether an equation is present or not:
if isinstance(block, Paragraph):
if "r:embed" in block._element.xml:
append content,style, and equation arrays, content being a drawing/image
elif "m:oMathPara" in block._element.xml:
append content,style, and equation arrays, content being an equation
equationXml.append(block._element.xml)
elif 'w:br w:type="page"' in block._element.xml:
append content,style, and equation arrays, content being a page break
else:
append content,style, and equation arrays), content being text
else:
append content,style, and equation arrays, content being a table
Once I have my arrays of content and style I loop through the content array and append table, drawings, pagebreaks, and texts.
if equationXml[i]=='0': #the content is either an image, table, text, or page break
if "Table" in str(contentStyle[i]):
insert table and caption
else:
if "drawing" in content[i]:
insert image and caption
elif "pageBreak" in content[i]:
document.add_page_break()
else:
insert text
else: #there is an equation present
document=EquationInsert.(document,equationXml[i])
My EquationInsert file has function called 'AddEquation' where I basically rewrite my document object (where UpdateableZipFile is a code I found online that quickly updates a file in a zip file):
def AddEquation(self,document,equationContent):
document.save('temp.docx')
z = zipfile.ZipFile('temp.docx')
tree=etree.parse(z.open('word/document.xml'))
nmspcDict = tree.getroot().iter().next().nsmap
for key in nmspcDict:
ET.register_namespace(key, nmspcDict[key])
tree2=etree.ElementTree(etree.fromstring(equationContent))
xmlRoot2=tree2.getroot()
xmlRoot=tree.getroot()
xmlRoot[1].append(xmlRoot2) #note that [1] had to be used bc [0] was a comment. need to see if general case or not
tree.write("document.xml",encoding="utf-8", xml_declaration=True, standalone="yes", pretty_print=True)
with UpdateableZipFile.UpdateableZipFile("temp.docx","a") as o:
o.write("document.xml","word/document.xml")
document = Document('temp.docx')
os.remove('document.xml')
z.close()
os.remove('temp.docx')
return document
This code adds the equation but as the main code continues to loop through sub-doc items, the equations are just pushed to the end of the base document somehow. I've tried returning a docx from the Insert equation function and creating a new document from it but that didn't do anything. If anyone has any advice on how to make the equation not go to the end of the file that would be very appreciated. Otherwise I'll have to venture into seeing how to convert these equations into images =/ or something that docx can handle. I'm open to solutions/suggestions/comments. Thanks!
Upvotes: 1
Views: 2341
Reputation: 28883
I'm sure you'll find your answer in the XML. You can conveniently browse an XML "part" in a .docx "package" using opc-diag
.
The paragraphs and tables in a Word document are located in the document.xml
part, as child elements under the <w:body>
element. The last element in <w:body>
is a section element (<w:sectPr>
IIRC). If you're appending your equations after that element, they will continue to float to the bottom as new paragraphs and tables are added above that sectPr element.
I would work with a short-as-possible test document and examine the XML produced by your code, comparing it to one that looks the way you want, perhaps created by hand in Word. That should quickly point up any element sequencing problems you have in your code.
Upvotes: 1