Reputation: 169
I am reading a docx file and displaying the data of it.
Lets say I have two files abc.docx and xyz.docx, where abc has some table along with some paragraphs. I want to display the data as it is in docx. But my below code is extracting texts and printing it.Can someone suggest me how can I do this?
Below is my code:
import docxpy
file1 = 'abc.docx'
file2 = 'xyz.docx'
message1 = docxpy.process(file1)
message1 = message1.encode('ascii', 'ignore').decode('ascii')
message2 = docxpy.process(file2)
message2 = message1.encode('ascii', 'ignore').decode('ascii')
message = message1 + message2
print(message)
I need to display the data as it is in docx file. Here the text inside my table is displaying but not the table. What can be done here?
Upvotes: 1
Views: 1924
Reputation: 51683
With docxpy - you can't. From the docxpy documentation:
It is a pure python-based utility to extract text from docx files. The code is taken and adapted from python-docx. It can however also extract text from header, footer and hyperlinks. It can now also extract images.
Use word to open the document, use libreoffice to open the document, use somthing that can convert word's docx to pdf and open the pdf / extract something from it.
You can not extract tables with docxpy - it is build to extract words from word files.
Searching SO I found python -docx to extract table from word docx - maybe that is an option to do what you want.
Upvotes: 1