Reputation: 10554
I have an object.
fp = open(self.currentEmailPath, "rb")
p = email.Parser.Parser()
self._currentEmailParsedInstance= p.parse(fp)
fp.close()
self.currentEmailParsedInstance, from this object I want to get the body of an email, text only no HTML....
How do I do it?
something like this?
newmsg=self._currentEmailParsedInstance.get_payload()
body=newmsg[0].get_content....?
then strip the html from body. just what is that .... method to return the actual text... maybe I mis-understand you
msg=self._currentEmailParsedInstance.get_payload()
print type(msg)
output = type 'list'
the email
Return-Path:
Received: from xx.xx.net (example) by mxx3.xx.net (xxx)
id 485EF65F08EDX5E12 for [email protected]; Thu, 23 Oct 2008 06:07:51 +0200
Received: from xxxxx2 (ccc) by example.net (ccc) (authenticated as [email protected])
id 48798D4001146189 for [email protected]; Thu, 23 Oct 2008 06:07:51 +0200
From: "example"
To:
Subject: FW: example
Date: Thu, 23 Oct 2008 12:07:45 +0800
Organization: example
Message-ID: <001601c934c4$xxxx30$a9ff460a@xxx>
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_NextPart_000_0017_01C93507.F6F64E30"
X-Mailer: Microsoft Office Outlook 11
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138
Thread-Index: Ack0wLaumqgZo1oXSBuIpUCEg/wfOAABAFEA
This is a multi-part message in MIME format.
------=_NextPart_000_0017_01C93507.F6F64E30
Content-Type: multipart/alternative;
boundary="----=_NextPart_001_0018_01C93507.F6F64E30"
------=_NextPart_001_0018_01C93507.F6F64E30
Content-Type: text/plain;
charset="us-ascii"
Content-Transfer-Encoding: 7bit
From: example.example[mailto:[email protected]]
Sent: Thursday, October 23, 2008 11:37 AM
To: [email protected]
Subject: S/I for example(B/L
No.:4357-0120-810.044)
Please find attached the example.doc),
Thanks.
B.rgds,
xxx xxx
------=_NextPart_001_0018_01C93507.F6F64E30
Content-Type: text/html;
charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:st1=3D"urn:schemas-microsoft-com:office:smarttags" =
xmlns=3D"http://www.w3.org/TR/REC-html40">
HTML STUFF till
------=_NextPart_001_0018_01C93507.F6F64E30--
------=_NextPart_000_0017_01C93507.F6F64E30
Content-Type: application/msword;
name="xxxx.doc"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
filename="xxxx.doc"
0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAAYAAAAAAAAAAA EAAAYgAAAAEAAAD+////AAAAAF8AAAD///////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// ///////////////////////////////////////////////////////////////////////////s pcEAI2AJBAAA+FK/AAAAAAAAEAAAAAAABgAAnEIAAA4AYmpiaqEVoRUAAAAAAAAAAAAAAAAAAAAA AAAECBYAMlAAAMN/AADDfwAAQQ4AAAAAAAAPAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//w8AAAAA AAAAAAD//w8AAAAAAAAAAAD//w8AAAAAAAAAAAAAAAAAAAAAAKQAAAAAAEYEAAAAAAAARgQAAEYE AAAAAAAARgQAAAAAAABGBAAAAAAAAEYEAAAAAAAARgQAABQAAAAAAAAAAAAAAFoEAAAAAAAA4hsA AAAAAADiGwAAAAAAAOIbAAA4AAAAGhwAAHwAAACWHAAARAAAAFoEAAAAAAAABzcAAEgBAADmHAAA FgAAAPwcAAAAAAAA/BwAAAAAAAD8HAAAAAAAAPwcAAAAAAAA/BwAAAAAAAD8HAAAAAAAAPwcAAAA AAAAMjYAAAIAAAA0NgAAAAAAADQ2AAAAAAAANDYAAAAAAAA0NgAAAAAAADQ2AAAAAAAANDYAACQA AABPOAAAaAIAALc6AACOAAAAWDYAAGkAAAAAAAAAAAAAAAAAAAAAAAAARgQAAAAAAABHLAAAAAAA AAAAAAAAAAAAAAAAAAAAAAD8HAAAAAAAAPwcAAAAAAAARywAAAAAAABHLAAAAAAAAFg2AAAAAAAA
------=_NextPart_000_0017_01C93507.F6F64E30--
I just want to get :
From: xxxx.xxxx [mailto:[email protected]]
Sent: Thursday, October 23, 2008 11:37 AM
To: [email protected]
Subject: S/I for xxxxx (B/L
No.:4357-0120-810.044)
Pls find attached the xxxx.doc),
Thanks.
B.rgds,
xxx xxx
not sure if the mail is malformed! seems if you get an html page you have to do this:
parts=self._currentEmailParsedInstance.get_payload()
print parts[0].get_content_type()
..._multipart/alternative_
textParts=parts[0].get_payload()
print textParts[0].get_content_type()
..._text/plain_
body=textParts[0].get_payload()
print body
...get the text without a problem!!
thank you so much Vinko.
So its kinda like dealing with xml, recursive in nature.
Upvotes: 4
Views: 3665
Reputation: 10554
ended up with this
parser = email.parser.Parser()
self._email = parser.parse(open('/home/vinko/jlm.txt','r'))
parts=self._email.get_payload()
check=parts[0].get_content_type()
if check == "text/plain":
return parts[0].get_payload()
elif check == "multipart/alternative":
part=parts[0].get_payload()
if part[0].get_content_type() == "text/plain":
return part[0].get_payload()
else:
return "cannot obtain the body of the email"
else:
return "cannot obtain the body of the email"
Upvotes: 0
Reputation: 340201
This will get you the contents of the message
self.currentEmailParsedInstance.get_payload()
As for the text only part you will have to strip HTML on your own, for example using BeautifulSoup.
Check this link for more information about the Message class the Parser returns. If you mean getting the text part of messages containing both HTML and plain text version of themselves, you can specify an index to get_payload() to get the part you want.
I tried with a different MIME email because what you pasted seems malformed, hopefully it got malformed when you edited it.
>>> parser = email.parser.Parser()
>>> message = parser.parse(open('/home/vinko/jlm.txt','r'))
>>> message.is_multipart()
True
>>> parts = message.get_payload()
>>> len(parts)
2
>>> parts[0].get_content_type()
'text/plain'
>>> parts[1].get_content_type()
'message/rfc822'
>>> parts[0].get_payload()
'Message Text'
parts will contain all parts of the multipart message, you can check their content types as shown and get only the text/plain ones, for instance.
Good luck.
Upvotes: 4