Reputation: 21294
I'm trying to open a word document that has a password.
I'm using docx package - a bit old
from docx import opendocx, getdocumenttext
and further on
document = opendocx(filename)
I was wondering if there were options on the opendocx to allow it to open password protected word documents - I do know the password. I checked the github repo here: https://github.com/mikemaccana/python-docx but didn't see an option. I'm trying to avoid rewriting the code to use a newer package but that may be inevitable.
Upvotes: 5
Views: 8635
Reputation: 63
The API in pywin32 can open a .doc or .docx file protected with password. The function
win32com.client.Dispatch
can open a .doc file by calling the WORD App and return a object which contains the information of the file. Than you can use the method open
of the object to load the data, you can pass the password to the 5th parameter.
Here is my code:
word = win32com.client.Dispatch('Word.Application')
word.Visible = False
word.DisplayAlerts = False
doc = word.Documents.Open(document_path, False, True, None, psw)
The parameter psw
is the password.
This method seems not support multi-threading program. I got an Error when I create a new thread.
The module docx
seems not support encrypted files.
Upvotes: 4
Reputation: 16354
python-docx doesn't support passwords at the moment. I didn't find it in the code as well, but to be sure, I asked on the python-docx mailing list and received the following reply:
Sorry, no. At least there's no built-in feature for it. I'm not sure how all that works with Word, it might be worth some research.
If it uses the Zip archive's password protection, you could open the .docx file (which is a Zip at the top level), and then do something I'm sure to feed it in. Worst case you could save it as another zip without a password and use that. And of course the interim zip could be a StringIO in-memory file.
If they use their own encryption I expect it would be quite a bit harder :)
Docx uses their own encryption, not zip encryption. This way only the internal contents need to be encrypted. Some info on decrypting docx files is available here:
One approach that you can use if you don't want to change your code is to fork the docx package and add code to decrypt the docx file. If you had another program to decrypt the document, you could also shell out to do the decryption.
Upvotes: 7
Reputation: 12640
If the .docx has write-protection only, I'd have thought the docx package should work as is since it probably ignores the relevant bit of XML. For read-protection, the MS-OFFCRYPTO format is described in detail on Microsoft's website at https://msdn.microsoft.com/en-us/library/office/cc313071%28v=office.12%29.aspx?f=255&MSPPError=-2147217396 . This document has pseudocode There is a C# implementation at https://www.lyquidity.com/devblog/?p=35 . It would in theory be possible to implement all this in python, but it's going to be a lot of additional work on top of what the current package does which is focused on XML and text processing.
I think the only option at present otherwise would be to decrypt the document using MS Word or LibreOffice and then use an alternative means to keep the file encrypted in a format which is accessible to python.
Upvotes: 0