user3567592
user3567592

Reputation: 43

how to parse emails faster than OpenPop.dll

it is possible using OpenPop.dll.

    Pop3Client objPOP3Client = new Pop3Client();
    int intTotalEmail = 0;
    DataTable dtEmail = new DataTable();
    object[] objMessageParts;

    try
    {
        dtEmail = GetAllEmailStructure();

        if (objPOP3Client.Connected)
            objPOP3Client.Disconnect();

        objPOP3Client.Connect(strHostName, intPort, bulUseSSL);
        try
        {
            objPOP3Client.Authenticate(strUserName, new Common()._Decode(strPassword));
            intTotalEmail = objPOP3Client.GetMessageCount();

            AddMapping();

            for (int i = 1; i <= intTotalEmail; i++)
            {
                objMessageParts = GetMessageContent(i, ref objPOP3Client, dtExistMailList);

                if (objMessageParts != null && objMessageParts[0].ToString() == "0")
                {
                    AddToDtEmail(objMessageParts, i, dtEmail, dtUserList, dtTicketIDList, dtBlacklistEmails, dtBlacklistSubject, dtBlacklistDomains);
                }
            }
        }
        catch (Exception ex)
        {
        }
    }
    catch (Exception ex)
    {
        ParserLogError(ex, "GetAllEmail()");
    }
    finally
    {
        if (objPOP3Client.Connected)
            objPOP3Client.Disconnect();
    }

    // function

     public object[] GetMessageContent(int intMessageNumber, ref Pop3Client objPOP3Client, DataTable dtExistingMails)
  {
    object[] strArrMessage = new object[10];
    Message objMessage;
    MessagePart plainTextPart = null, HTMLTextPart = null;
    string strMessageId = "";

    try
    {
        strArrMessage[0] = "";
        strArrMessage[1] = "";
        strArrMessage[2] = "";
        strArrMessage[3] = "";
        strArrMessage[4] = "";
        strArrMessage[5] = "";
        strArrMessage[6] = "";
        strArrMessage[7] = null;
        strArrMessage[8] = null;
        strArrMessage[7] = "";
        strArrMessage[8] = "";

        objMessage = objPOP3Client.GetMessage(intMessageNumber);
        strMessageId = (objMessage.Headers.MessageId == null ? "" : objMessage.Headers.MessageId.Trim());

        if (!IsExistMessageID(dtExistingMails, strMessageId)) //check in data base message id is exists or not 
        {
            strArrMessage[0] = "0";
            strArrMessage[1] = objMessage.Headers.From.Address.Trim();     // From EMail Address
            strArrMessage[2] = objMessage.Headers.From.DisplayName.Trim(); // From EMail Name
            strArrMessage[3] = objMessage.Headers.Subject.Trim();// Mail Subject     
            plainTextPart = objMessage.FindFirstPlainTextVersion();
            strArrMessage[4] = (plainTextPart == null ? "" : plainTextPart.GetBodyAsText().Trim());
            HTMLTextPart = objMessage.FindFirstHtmlVersion();
            strArrMessage[5] = (HTMLTextPart == null ? "" : HTMLTextPart.GetBodyAsText().Trim());
            strArrMessage[6] = strMessageId;
            List<MessagePart> attachment = objMessage.FindAllAttachments();
            strArrMessage[7] = null;
            strArrMessage[8] = null;
            if (attachment.Count > 0)
            {
                if (attachment[0] != null && attachment[0].IsAttachment)
                {
                    strArrMessage[7] = attachment[0].FileName.Trim();
                    strArrMessage[8] = attachment[0];
                }
            }
        }
        else
        {
            strArrMessage[0] = "1";
        }
    }
    catch (Exception ex)
    {
        ParserLogError(ex, "GetMessageContent()");
    }
    return strArrMessage;
 }

but, i want to make it faster than above OpenPop.dll. so please let me know if any other technique are there for parsing mails.

please check code and then tell me.

Thanks in advance

Upvotes: 2

Views: 3696

Answers (1)

jstedfast
jstedfast

Reputation: 38608

but, i want to make it faster than above OpenPop.dll. so please let me know if any other technique are there for parsing mails.

In your GetMessageContent() method, the 1 place that consumes the vast amount of time is:

objMessage = objPOP3Client.GetMessage(intMessageNumber);

The network I/O part of downloading a message cannot really be optimized, but OpenPop.NET's parser is slow (based on my own performance tests).

MimeKit is 25x faster than OpenPop.NET at parsing email messages.

One of the main performance problems in OpenPop.NET's MIME parser is the fact that it uses a StreamReader for parsing (which is slow due to unnecessary charset conversion, reading 1 line at a time, etc - I have an analysis of another email library that uses StreamReader for parsing here: https://stackoverflow.com/a/18787176/87117).

Then there's the problem that OpenPop.NET's parser also uses Regex to remove CFWS (Comments and Folding White Space) from a header string before parsing/decoding it. This is expensive. It's far better to write a good tokenizer that can deal with CFWS.

If you are interested in some of the other techniques I used to optimize MimeKit to be so fast (as fast or faster than highly optimized C implementations), I wrote some blog posts about this:

Optimization Tricks used by MimeKit: Part 1

The summary of the optimization I talk about in part 1 is replacing loops like this that scan for the end of a line:

while (*inptr != (byte) '\n')
    inptr++;

with a faster loop, like this:

int* dword = (int*) inptr;

do {
    mask = *dword++ ^ 0x0A0A0A0A;
    mask = ((mask - 0x01010101) & (~mask & 0x80808080));
} while (mask == 0);

inptr = (byte*) (dword - 1);
while (*inptr != (byte) '\n')
    inptr++;

which improved performance by 20% (although on non-x86 architectures, it requires 'dword' to be 4-byte aligned).

Optimization Tricks used by MimeKit: Part 2

In part 2, I talk about writing a more optimized version of System.IO.MemoryStream. The problem with MemoryStream is that it has to keep 1 contiguous block of memory with the content, which means that as you write more data to it and it has to resize its internal byte array, it has to copy the content to the new array (which is expensive, especially once the amount of data in the stream is large).

To work around this performance bottleneck, I wrote a MemoryBlockStream which does not need to use a contiguous block of memory - it uses a linked list of byte arrays. Instead of having to resize the byte array when you overflow the current buffer, it simply allocates another 2048-byte array that the data will overflow into and appends it to the linked list.

Note: MimeKit itself only does email parsing, it doesn't do POP3 or SMTP or IMAP. If you want that kind of functionality, I've also written a library built on MimeKit that does that as well: MailKit

Update:

Sample code using MailKit (as requested) to download/parse all messages:

using System;
using System.Net;

using MailKit.Net.Pop3;
using MailKit;
using MimeKit;

namespace TestClient {
    class Program
    {
        public static void Main (string[] args)
        {
            using (var client = new Pop3Client ()) {
                client.Connect ("pop.gmail.com", 995, true);

                // Note: since we don't have an OAuth2 token, disable
                // the XOAUTH2 authentication mechanism.
                client.AuthenticationMechanisms.Remove ("XOAUTH2");

                client.Authenticate ("[email protected]", "password");

                int count = client.GetMessageCount ();
                for (int i = 0; i < count; i++) {
                    var message = client.GetMessage (i);
                    Console.WriteLine ("Subject: {0}", message.Subject);
                }

                client.Disconnect (true);
            }
        }
    }
}

Upvotes: 7

Related Questions