Emails' rich characters are mistranslated when read from database using MimeKit

I have a windows service written in VB.Net that downloads emails into MimeMessage objects, removes their attachments, and then writes the remains of the email to a SQL Server database. A separate ASP.Net application (using VB.Net) reads the email back into a MimeMessage object and returns it to the user upon request.

Something happens during this process that causes strange characters to appear in the output.

This question (Content encoding using MimeKit/MailKit) seemed promising, but changing the character encoding from ASCII to UTF8 etc didn't solve it.

Here’s the code that saves the email to the database:

Sub ImportEmail(exConnectionString As String)
    Dim oClient As New Pop3Client()
    ' … email connection code removed …
    Dim message = oClient.GetMessage(0)
    Dim strippedMessage As MimeMessage = message
    ' … code to remove attachments removed …
    Dim mem As New MemoryStream
    strippedMessage.WriteTo(mem)
    Dim bytes = mem.ToArray
    Dim con As New SqlConnection(exConnectionString)
    con.Open()
    Dim com As New SqlCommand("INSERT INTO Emails (Body) VALUES (@RawDocument)", con)
    com.CommandType = CommandType.Text
    com.Parameters.AddWithValue("@RawDocument", bytes)
    com.ExecuteNonQuery()
    con.Close()
End Sub

And here’s the ASP.Net code to read it back to the user:


Private Sub OutputEmail(exConnectionString As String)
    Dim BlobString As String = ""
    Dim Sql As String = "SELECT Body FROM Emails WHERE Id = @id"    
    Dim com As New SqlClient.SqlCommand(Sql)
    com.CommandType = CommandType.Text
    com.Parameters.AddWithValue("@id", ViewState("email_id")) 

    Dim con As New SqlConnection(exConnectionString)
    con.Open()
    com.Connection = con
    Dim da As New SqlClient.SqlDataAdapter(com)
    Dim dt As New DataTable()
    da.Fill(dt)
    con.Close()

    If dt.Rows.Count > 0 Then
        Dim Row = dt.Rows(0)
        BlobString = Row(0).ToString()

        Dim MemStream As MemoryStream = GetMemoryStreamFromASCIIEncodedString(BlobString)
        Dim message As MimeMessage = MimeMessage.Load(MemStream)

        BodyBuilder.HtmlBody = message.HtmlBody
        BodyBuilder.TextBody = message.TextBody
        message.Body = BodyBuilder.ToMessageBody()

        Response.ContentType = "message/rfc822"
        Response.AddHeader("Content-Disposition", "attachment;filename=""" & Left(message.Subject, 35) & ".eml""")
        Response.Write(message)
        Response.End()
    End If
End Sub

Private Function GetMemoryStreamFromASCIIEncodedString(ByVal BlobString As String) As MemoryStream
    Dim BlobStream As Byte() = Encoding.ASCII.GetBytes(BlobString) ' **
    Dim MemStream As MemoryStream = New MemoryStream()
    MemStream.Write(BlobStream, 0, BlobStream.Length)
    MemStream.Position = 0
    Return MemStream
End Function

For example, let’s say the text below appears in the original email:

“So long and thanks for all the fish” (fancy quotes)

When read back, it appears as follows:

†So long and thanks for all the fishâ€

Other character replacements are as follows:

– (long dash) becomes –

• (bullets) become •

Upvotes: 1

Views: 270

Answers (1)

jstedfast
jstedfast

Reputation: 38538

The problem is with the following snippet:

If dt.Rows.Count > 0 Then
    Dim Row = dt.Rows(0)
    BlobString = Row(0).ToString() ' <-- the ToString() is the problem

    Dim MemStream As MemoryStream = GetMemoryStreamFromASCIIEncodedString(BlobString)
    Dim message As MimeMessage = MimeMessage.Load(MemStream)

To fix the data corruption, what you need to do is this:

If dt.Rows.Count > 0 Then
    Dim Row = dt.Rows(0)
    Dim BlobString as Byte() = Row(0)

    Dim MemStream As MemoryStream = new MemoryStream (BlobString, False)
    Dim message As MimeMessage = MimeMessage.Load(MemStream)

You can also get rid of your GetMemoryStreamFromASCIIEncodedString function.

(Note: I don't know VB.NET, so I'm just guessing at the syntax, but it should be pretty close to being right)

Upvotes: 1

Related Questions