Reputation: 3813
I want to treat Outlook .msg
file as string and check if a substring exists in it.
So I thought importing win32
library, which is suggested in similar SO threads, would be an overkill.
Instead, I tried to just open the file the same way as a .txt file:
file_path= 'O:\\MAP\\177926 Delete comiitted position.msg'
mail = open(file_path)
mail_contents = mail.read()
print(mail_contents)
However, I get
UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 870: character maps to <undefined>
Is there any decoding I can specify to make it work?
I have also tried
mail = open(file_path, encoding='utf-8')
which returns
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 in position 0: invalid continuation byte
Upvotes: 3
Views: 6908
Reputation: 11406
Unless you're willing to do a lot of work, you really should use a library for this.
First, a .msg
file is a binary file, so the contents should not be read in as a string. A string is usually terminated with a null byte
, and binary files can have a lot of those inside, which could mean you're not looking at all the data (might depend on the implementation).
Also, the .msg
file can have plain ascii and/or unicode in different parts/blocks of the file, so it would be really hard to treat this as one string to search for a substring.
As an alternative you could save the mails as .eml
(i.e. the plain text version of an e-mail), but there would still be some problems to overcome in order to search for a specific text:
base64
in which you would never find the text you're looking for.Upvotes: 2
Reputation: 9997
When you face these type of issues, it is good pratice to try the Python Latin-1
encoding.
mail = open(file_path, encoding='Latin-1')
We often confound the Windows cp1252
encoding with the actual Python's Latin-1
. Indeed, the latter maps all possible byte values to the first 256 Unicode code points.
See this for more information.
Upvotes: 1