James Cooper
James Cooper

Reputation: 57

Processing .docm files with Python?

I have 700+ Word documents (mostly in .docm format), the contents being a mixture of text and tables.

I'm trying to extract info from the tables. After much searching the only Python library that can detect tables is python-docx, which breaks when pointed at .docm files. The GitHub thread indicates this issue hasn't been addressed.

Further searching implies to convert .docm to .docx will require me to learn VB or C#, which isn't happening in the timescale I have unless I can acquire an absolutely clear explain-like-i'm-five kinda solution.

Is there any way to achieve this or a potential alternative route?

Upvotes: 1

Views: 3784

Answers (2)

Doug
Doug

Reputation: 232

The python library you are looking for is called docx2python. I had an almost identical problem and it worked great with .docm files.

Upvotes: 1

JMRog
JMRog

Reputation: 1

Infarct you can use the Python-docx library with a minor modification, as detailed, in this pull request on the project's GitHub, which unfortunately has not been accepted.

Pull Request #716: added support for .docm files

It consist on Three small modification on the library files that enables the loading of .docm files. it will allow you to load these type of files read & modify the standard part of these files but it will not allow you to play with the macros.

Upvotes: 0

Related Questions