Reeza
Reeza

Reputation: 21294

Opening a Word document that has a password using docx library

I'm trying to open a word document that has a password.

I'm using docx package - a bit old

from docx import opendocx, getdocumenttext

and further on

 document = opendocx(filename)

I was wondering if there were options on the opendocx to allow it to open password protected word documents - I do know the password. I checked the github repo here: https://github.com/mikemaccana/python-docx but didn't see an option. I'm trying to avoid rewriting the code to use a newer package but that may be inevitable.

Upvotes: 5

Views: 8635

Answers (3)

李卓然
李卓然

Reputation: 63

The API in pywin32 can open a .doc or .docx file protected with password. The function win32com.client.Dispatch can open a .doc file by calling the WORD App and return a object which contains the information of the file. Than you can use the method open of the object to load the data, you can pass the password to the 5th parameter. Here is my code: word = win32com.client.Dispatch('Word.Application') word.Visible = False word.DisplayAlerts = False doc = word.Documents.Open(document_path, False, True, None, psw) The parameter psw is the password.

This method seems not support multi-threading program. I got an Error when I create a new thread.

The module docx seems not support encrypted files.

Upvotes: 4

Grokify
Grokify

Reputation: 16354

python-docx doesn't support passwords at the moment. I didn't find it in the code as well, but to be sure, I asked on the python-docx mailing list and received the following reply:

Sorry, no. At least there's no built-in feature for it. I'm not sure how all that works with Word, it might be worth some research.

If it uses the Zip archive's password protection, you could open the .docx file (which is a Zip at the top level), and then do something I'm sure to feed it in. Worst case you could save it as another zip without a password and use that. And of course the interim zip could be a StringIO in-memory file.

If they use their own encryption I expect it would be quite a bit harder :)

Docx uses their own encryption, not zip encryption. This way only the internal contents need to be encrypted. Some info on decrypting docx files is available here:

One approach that you can use if you don't want to change your code is to fork the docx package and add code to decrypt the docx file. If you had another program to decrypt the document, you could also shell out to do the decryption.

Upvotes: 7

Nick Kennedy
Nick Kennedy

Reputation: 12640

If the .docx has write-protection only, I'd have thought the docx package should work as is since it probably ignores the relevant bit of XML. For read-protection, the MS-OFFCRYPTO format is described in detail on Microsoft's website at https://msdn.microsoft.com/en-us/library/office/cc313071%28v=office.12%29.aspx?f=255&MSPPError=-2147217396 . This document has pseudocode There is a C# implementation at https://www.lyquidity.com/devblog/?p=35 . It would in theory be possible to implement all this in python, but it's going to be a lot of additional work on top of what the current package does which is focused on XML and text processing.

I think the only option at present otherwise would be to decrypt the document using MS Word or LibreOffice and then use an alternative means to keep the file encrypted in a format which is accessible to python.

Upvotes: 0

Related Questions