Aroon
Aroon

Reputation: 1039

How to retrieve MS WORD(2003) doc file in python

I'm doing a project which is related on analytic. For that i need to count some row's values from MS WORD. If it is .docx extension then there is no problem for me but when it is a .doc extension, I cannot suppose to read those files. What can i do for that? I'm using python 3.6 and installed .docx module as well. Expecting a good answer. Thanks in advance!

Upvotes: 0

Views: 2040

Answers (1)

Andreas
Andreas

Reputation: 221

You can do this using win32com:

import win32com

from win32com.client import gencache, constants, Dispatch
# that's the magic part
gencache.EnsureModule('{00020905-0000-0000-C000-000000000046}', 0, 8, 3)

app = Dispatch("Word.Application.8")
# open a document
app.Documents.Open("MyDocument.doc")

Now you can do whatever you want with this document. If the line with gencache gives you an error then you will need to create the COM module first by executing:

lib\site-packages\win32com\client\makepy.py

This will pop-up a Window where you need to select the 'Microsoft Word Object Library'.

Upvotes: 2

Related Questions