Reputation: 26336
I may get slammed because this question is too broad, but anyway I going to ask cause what else do I do? Digging through the Python source code should surely give me enough "good effort" points to warrant helping me?
I am trying to use Python 3.4's new email content manager http://docs.python.org/dev/library/email.contentmanager.html#content-manager-instances
It is my understanding that this should allow me to read an email message, then be able to access all the email header fields and body as UTF-8, without going through the painful process of decoding from whatever weird encoding back into clean UTF-8. I understand is also handles parsing of date headers and email address headers. Generally making life easier for reading emails in Python. Great stuff, very interesting.
However I am a beginner programmer - there are no examples in the current documentation of how to start from the start. I need a simple example showing how to read an email file and using the new email content manager, read back the header fields, address fields and body/
I have dug into the python 3.4 source code and looked at the tests for the email content manager. I will admit to being sufficiently amatuerish that I was too confused to be able to glean enough from the tests to start writing my own simple example.
So, is anyone willing to help with a simple example of how to use the Python 3.4 email content manager to read the header fields and body and address fields of an email?
thanks
Upvotes: 8
Views: 2418
Reputation: 11300
If you have an email in a file and want to read it into Python, it's the email.Parser
you should probably look at first. Like Brandon, I don't quite see the need for using the contentmanager
, but maybe your question is too broad and you need to help me understand it better.
Code could look like:
filename = 'your_file_here.email.txt'
import email.parser
with open(filename, 'r') as fh:
message = email.parser.Parser().parse(fh)
There are even convenience functions, and the one for your case would be:
import email
message = email.message_from_file('your_file_here.email.txt')
Then check the docs on email.message to see how to access the message's content. You can check with is_multipart()
if it's a single monolithic block of text, or a MIME message consisting of multiple parts. In the latter case, there's walk()
to iterate over each part.
Upvotes: 0
Reputation: 89454
First: the “address fields” in an email are in fact simply headers whose names have been agreed upon in standards, like To
and From
. So all you need are the email headers and body and you are done.
Given a modern contentmanager
-powered EmailMessage
instance such as Python 3.4 returns if you specify a policy (like default
) when reading in an email message, you can access its auto-decoded headers by treating it like a Python dictionary, and its body with the get_body()
call. Here is an example script I wrote that does both maneuvers in a safe and standard way:
https://github.com/brandon-rhodes/fopnp/blob/m/py3/chapter12/display_email.py
Behind the scenes, the policy is what is really in charge of what happens to both headers and content — with the default
policy automatically subjecting headers to the encoding and decoding functions in email.utils
, and content to the logic you asked about that is inside of contentmanager
.
But as the caller you usually will not need to know the behind-the-scenes magic, because headers will “just work” and content can be easily accessed through the methods illustrated in the above script.
Upvotes: 4