Reputation: 513
I know that to get all message body, this is the command:
[imap_code] UID FETCH [uid] BODY.PEEK[TEXT]
Thus I get the entire message body. But I need to exclude the part of the attachments. I want only message wrote from sender, text and/or html.
Is there a way?
This is a full raw html mail with attachment
I would like to get only
<div dir="ltr">This is the message body<div><ul><li>one</li><li>two</li></ul></div></div>
or plain text if there isn't html version
Upvotes: 2
Views: 6283
Reputation: 10985
Messages are laid out in an arbitrary tree of parts, with parent items being of the multipart/* or message/rfc822 type, and children being of other types. The FETCH BODY[...]
lets arbitrarily extract any of these parts.
Unfortunately, there is no standard layout for messages. You can fetch the BODYSTRUCTURE item to get the MIME layout of a message, but it is very difficult to parse by eye.
That being said, there's a few common message layouts that will get you most of the way.
The easiest is a message with just one body, either text/html or text/plain. Just fetch BODY[TEXT]
.
The next is multi-format, with both text/html and text/plain. Its MIME structure generally looks like this:
+ multipart/alternative [TEXT]
|- text/plain [1]
\- text/html [2]
In this case you want to fetch BODY[2]
.
If the message is single-body, with attachments, it will look something like this:
+ multipart/mixed or multipart/related [TEXT]
|- text/html or text/plain [1]
|- image/jpg [2]
| ...
\- image/gif
In this case you want BODY[1]
.
Last is both of these: multi-format body with attachments. It will tend to look something like:
+ multipart/mixed or multipart/related [TEXT]
|-+ multipart/alternative [1]
| |- text/plain [1.1]
| \- text/html [1.2]
|- image/jpeg [2]
|- image/gif [3]
|...
\- image/png
In this case, you probably want BODY[1.2]
. Your sample message is of this type.
Upvotes: 16