Rajeev Srivastava
Rajeev Srivastava

Reputation: 169

Displaying the contents of docx file using python

I am reading a docx file and displaying the data of it.

Lets say I have two files abc.docx and xyz.docx, where abc has some table along with some paragraphs. I want to display the data as it is in docx. But my below code is extracting texts and printing it.Can someone suggest me how can I do this?

Below is my code:

import docxpy

file1 = 'abc.docx'
file2 = 'xyz.docx'

message1 = docxpy.process(file1)
message1 = message1.encode('ascii', 'ignore').decode('ascii')
message2 = docxpy.process(file2)
message2 = message1.encode('ascii', 'ignore').decode('ascii')

message = message1 + message2

print(message)

I need to display the data as it is in docx file. Here the text inside my table is displaying but not the table. What can be done here?

Upvotes: 1

Views: 1924

Answers (1)

Patrick Artner
Patrick Artner

Reputation: 51683

With docxpy - you can't. From the docxpy documentation:

It is a pure python-based utility to extract text from docx files. The code is taken and adapted from python-docx. It can however also extract text from header, footer and hyperlinks. It can now also extract images.

Use word to open the document, use libreoffice to open the document, use somthing that can convert word's docx to pdf and open the pdf / extract something from it.

You can not extract tables with docxpy - it is build to extract words from word files.

Searching SO I found python -docx to extract table from word docx - maybe that is an option to do what you want.

Upvotes: 1

Related Questions