non
non

Reputation: 88

How to obtain the WYSIWYG body of an email message using MimeKit

I was using a library called EAgetmail to retrieve the body of a specified email and it was working well, however I am now using Mailkit. The problem is with EAgetmail the equivalent of message.body returns the body as the user sees it in email clients, but in mailkit it returns a lot of different data.

This is the relevant code:

using (var client = new ImapClient())
{
    client.Connect(emailServer, 993, true);
    client.AuthenticationMechanisms.Remove("XOAUTH2");
    client.Authenticate(username, password);
    var inbox = client.Inbox;
    inbox.Open(FolderAccess.ReadOnly);
    SearchQuery query;
    if (checkBox.IsChecked == false)
    {
        query = SearchQuery.DeliveredBefore((DateTime)dateEnd).And(
            SearchQuery.DeliveredAfter((DateTime)dateStart)).And(
            SearchQuery.SubjectContains("Subject to find"));
    }
    else
    {
        query = SearchQuery.SubjectContains("Subject to find");
    }
    foreach (var uid in inbox.Search(query))
    {
        var message = inbox.GetMessage(uid);
        formEmails.Add(message.TextBody);
        messageDate.Add(message.Date.LocalDateTime);
    }
    client.Disconnect(true);
}

I also tried message.Body.ToString() and searching through the message parts for plain text, but neither worked. My question is how do I replicate the effect of EAgetmail's .body property using Mailkit (to return only the body contents in plain text, as the user sees)?

Upvotes: 4

Views: 4408

Answers (2)

Vinnie Amir
Vinnie Amir

Reputation: 621

Old post, but relevant, can use inbuilt MimeKit to get body as text:

string body = mimeMessage.GetTextBody(MimeKit.Text.TextFormat.Plain);

Upvotes: 0

jstedfast
jstedfast

Reputation: 38573

A common misunderstanding about email is that there is a well-defined message body and then a list of attachments. This is not really the case. The reality is that MIME is a tree structure of content, much like a file system.

Luckily, MIME does define a set of general rules for how mail clients should interpret this tree structure of MIME parts. The Content-Disposition header is meant to provide hints to the receiving client as to which parts are meant to be displayed as part of the message body and which are meant to be interpreted as attachments.

The Content-Disposition header will generally have one of two values: inline or attachment.

The meaning of these values should be fairly obvious. If the value is attachment, then the content of said MIME part is meant to be presented as a file attachment separate from the core message. However, if the value is inline, then the content of that MIME part is meant to be displayed inline within the mail client's rendering of the core message body. If the Content-Disposition header does not exist, then it should be treated as if the value were inline.

Technically, every part that lacks a Content-Disposition header or that is marked as inline, then, is part of the core message body.

There's a bit more to it than that, though.

Modern MIME messages will often contain a multipart/alternative MIME container which will generally contain a text/plain and text/html version of the text that the sender wrote. The text/html version is typically formatted much closer to what the sender saw in his or her WYSIWYG editor than the text/plain version.

The reason for sending the message text in both formats is that not all mail clients are capable of displaying HTML.

The receiving client should only display one of the alternative views contained within the multipart/alternative container. Since alternative views are listed in order of least faithful to most faithful with what the sender saw in his or her WYSIWYG editor, the receiving client should walk over the list of alternative views starting at the end and working backwards until it finds a part that it is capable of displaying.

Example:

multipart/alternative
  text/plain
  text/html

As seen in the example above, the text/html part is listed last because it is the most faithful to what the sender saw in his or her WYSIWYG editor when writing the message.

To make matters even more complicated, sometimes modern mail clients will use a multipart/related MIME container instead of a simple text/html part in order to embed images and other multimedia content within the HTML.

Example:

multipart/alternative
  text/plain
  multipart/related
    text/html
    image/jpeg
    video/mp4
    image/png

In the example above, one of the alternative views is a multipart/related container which contains an HTML version of the message body that references the sibling video and images.

Now that you have a rough idea of how a message is structured and how to interpret various MIME entities, we can start figuring out how to actually render the message as intended.

Using a MimeVisitor (the most accurate way of rendering a message)

MimeKit includes a MimeVisitor class for visiting each node in the MIME tree structure. For example, the following MimeVisitor subclass could be used to generate HTML to be rendered by a browser control (such as WebBrowser):

/// <summary>
/// Visits a MimeMessage and generates HTML suitable to be rendered by a browser control.
/// </summary>
class HtmlPreviewVisitor : MimeVisitor
{
    List<MultipartRelated> stack = new List<MultipartRelated> ();
    List<MimeEntity> attachments = new List<MimeEntity> ();
    readonly string tempDir;
    string body;

    /// <summary>
    /// Creates a new HtmlPreviewVisitor.
    /// </summary>
    /// <param name="tempDirectory">A temporary directory used for storing image files.</param>
    public HtmlPreviewVisitor (string tempDirectory)
    {
        tempDir = tempDirectory;
    }

    /// <summary>
    /// The list of attachments that were in the MimeMessage.
    /// </summary>
    public IList<MimeEntity> Attachments {
        get { return attachments; }
    }

    /// <summary>
    /// The HTML string that can be set on the BrowserControl.
    /// </summary>
    public string HtmlBody {
        get { return body ?? string.Empty; }
    }

    protected override void VisitMultipartAlternative (MultipartAlternative alternative)
    {
        // walk the multipart/alternative children backwards from greatest level of faithfulness to the least faithful
        for (int i = alternative.Count - 1; i >= 0 && body == null; i--)
            alternative[i].Accept (this);
    }

    protected override void VisitMultipartRelated (MultipartRelated related)
    {
        var root = related.Root;

        // push this multipart/related onto our stack
        stack.Add (related);

        // visit the root document
        root.Accept (this);

        // pop this multipart/related off our stack
        stack.RemoveAt (stack.Count - 1);
    }

    // look up the image based on the img src url within our multipart/related stack
    bool TryGetImage (string url, out MimePart image)
    {
        UriKind kind;
        int index;
        Uri uri;

        if (Uri.IsWellFormedUriString (url, UriKind.Absolute))
            kind = UriKind.Absolute;
        else if (Uri.IsWellFormedUriString (url, UriKind.Relative))
            kind = UriKind.Relative;
        else
            kind = UriKind.RelativeOrAbsolute;

        try {
            uri = new Uri (url, kind);
        } catch {
            image = null;
            return false;
        }

        for (int i = stack.Count - 1; i >= 0; i--) {
            if ((index = stack[i].IndexOf (uri)) == -1)
                continue;

            image = stack[i][index] as MimePart;
            return image != null;
        }

        image = null;

        return false;
    }

    // Save the image to our temp directory and return a "file://" url suitable for
    // the browser control to load.
    // Note: if you'd rather embed the image data into the HTML, you can construct a
    // "data:" url instead.
    string SaveImage (MimePart image, string url)
    {
        string fileName = url.Replace (':', '_').Replace ('\\', '_').Replace ('/', '_');

        string path = Path.Combine (tempDir, fileName);

        if (!File.Exists (path)) {
            using (var output = File.Create (path))
                image.ContentObject.DecodeTo (output);
        }

        return "file://" + path.Replace ('\\', '/');
    }

    // Replaces <img src=...> urls that refer to images embedded within the message with
    // "file://" urls that the browser control will actually be able to load.
    void HtmlTagCallback (HtmlTagContext ctx, HtmlWriter htmlWriter)
    {
        if (ctx.TagId == HtmlTagId.Image && !ctx.IsEndTag && stack.Count > 0) {
            ctx.WriteTag (htmlWriter, false);

            // replace the src attribute with a file:// URL
            foreach (var attribute in ctx.Attributes) {
                if (attribute.Id == HtmlAttributeId.Src) {
                    MimePart image;
                    string url;

                    if (!TryGetImage (attribute.Value, out image)) {
                        htmlWriter.WriteAttribute (attribute);
                        continue;
                    }

                    url = SaveImage (image, attribute.Value);

                    htmlWriter.WriteAttributeName (attribute.Name);
                    htmlWriter.WriteAttributeValue (url);
                } else {
                    htmlWriter.WriteAttribute (attribute);
                }
            }
        } else if (ctx.TagId == HtmlTagId.Body && !ctx.IsEndTag) {
            ctx.WriteTag (htmlWriter, false);

            // add and/or replace oncontextmenu="return false;"
            foreach (var attribute in ctx.Attributes) {
                if (attribute.Name.ToLowerInvariant () == "oncontextmenu")
                    continue;

                htmlWriter.WriteAttribute (attribute);
            }

            htmlWriter.WriteAttribute ("oncontextmenu", "return false;");
        } else {
            // pass the tag through to the output
            ctx.WriteTag (htmlWriter, true);
        }
    }

    protected override void VisitTextPart (TextPart entity)
    {
        TextConverter converter;

        if (body != null) {
            // since we've already found the body, treat this as an attachment
            attachments.Add (entity);
            return;
        }

        if (entity.IsHtml) {
            converter = new HtmlToHtml {
                HtmlTagCallback = HtmlTagCallback
            };
        } else if (entity.IsFlowed) {
            var flowed = new FlowedToHtml ();
            string delsp;

            if (entity.ContentType.Parameters.TryGetValue ("delsp", out delsp))
                flowed.DeleteSpace = delsp.ToLowerInvariant () == "yes";

            converter = flowed;
        } else {
            converter = new TextToHtml ();
        }

        body = converter.Convert (entity.Text);
    }

    protected override void VisitTnefPart (TnefPart entity)
    {
        // extract any attachments in the MS-TNEF part
        attachments.AddRange (entity.ExtractAttachments ());
    }

    protected override void VisitMessagePart (MessagePart entity)
    {
        // treat message/rfc822 parts as attachments
        attachments.Add (entity);
    }

    protected override void VisitMimePart (MimePart entity)
    {
        // realistically, if we've gotten this far, then we can treat this as an attachment
        // even if the IsAttachment property is false.
        attachments.Add (entity);
    }
}

And the way you'd use this visitor might look something like this:

void Render (MimeMessage message)
{
    var tmpDir = Path.Combine (Path.GetTempPath (), message.MessageId);
    var visitor = new HtmlPreviewVisitor (tmpDir);

    Directory.CreateDirectory (tmpDir);

    message.Accept (visitor);

    DisplayHtml (visitor.HtmlBody);
    DisplayAttachments (visitor.Attachments);
}

Using the TextBody and HtmlBody Properties (the easiest way)

To simplify the common task of getting the text of a message, MimeMessage includes two properties that can help you get the text/plain or text/html version of the message body. These are TextBody and HtmlBody, respectively.

Keep in mind, however, that at least with the HtmlBody property, it may be that the HTML part is a child of a multipart/related, allowing it to refer to images and other types of media that are also contained within that multipart/related entity. This property is really only a convenience property and is not a really good substitute for traversing the MIME structure yourself so that you may properly interpret related content.

Upvotes: 13

Related Questions