user6086585
user6086585

Reputation: 39

Get the text from Email Attachments and write to a file

I am working with Microsoft Visual Studio 2015 and I am trying to convert something from list form to string form. I have found some solutions for similar problems but not this one in particular.

I would like to eventually have this code work:

Dim info As Byte() = New UTF8Encoding(True).GetBytes(Utilities.GetEmailInfo(msg.Message).Attachments)

My end game is to take the text from Attachments and write it to a file. If I use some of the other data types listed below such as ToData, the file turns out properly but I encounter an error with the code above because GetBytes cannot get the text from a list. Is there another function I could use to get the text from the list?

The class that I need to convert contains the following:

Public Class EmailInfo
    Public FromData As String = vbNullString                            'FROM:
    Public ToData As String = vbNullString                              'TO:
    Public DateData As String = vbNullString                            'DATE:
    Public SubjectData As String = vbNullString                         'SUBJECT:
    Public MessageBody As EmailItem                                     'contents of message body
    Public AlternateViews As New Collections.Generic.List(Of EmailItem) 'list of alternate views
    Public Attachments As New Collections.Generic.List(Of EmailItem)    'list of attachments
End Class

The resource that I want to access is EmailInfo.Attachments. This resource is stored as a list of type EmailItem. The code for this type is as follows:

Public Class EmailItem
    Public ContentType As String = vbNullString         'CONTENT-TYPE data
    Public ContentTypeData As String = vbNullString     'filename or text encoding
    Public ContentTypeDataIsFilename As Boolean = False 'True if ContentTypeData specifies a filename
    Public ContentEncoding As String = vbNullString     'CONTENT-TRANSFER-ENCODING data
    Public ContentBody As String = vbNullString         'raw data of block
End Class

I have tried using some code such as String.Join but I end up with a blank string.

Please pardon my ignorance as I am new to VB.

Thank you all for all of your help!

Ryan

Upvotes: 0

Views: 3329

Answers (1)

Jeremy Thompson
Jeremy Thompson

Reputation: 65594

This no trivial task. Attachments could be any number of proprietary formats: ".pdf", ".doc", ".xls", ".ppt", ".csv", ".vsd", ".zip", ".rar", ".txt", ".html", ".proj", etc, etc , etc.

Good news is all the work has already been done for you and I will show you how to generically read almost any file format and extract the text in this answer:

Generically read any file format and convert it to .txt format

Make sure you read the Info to set it up paragraph.

So go ahead and reference TikaOnDotnet & TikaOnDotnet.TextExtractor to your project using NuGet (Tools menu > NuGet Package Manager).

I am assuming you have written code to extract the email attachments, using an Outlook Add-In MSDN How to: Programmatically Save Attachments from Outlook E-Mail Items or just an app that uses Outlook via Interop, eg:

In C#:

private TextExtractor _textExtractor;
private string _attachmentTextFilepath = @"c:\temp\EmailAttachmentText.txt";
static void IterateMessages(Outlook.Folder folder)
{
    var fi = folder.Items;
    if (fi != null)
    {
        foreach (Object item in fi)
        {
            Outlook.MailItem mi = (Outlook.MailItem)item;
            var attachments = mi.Attachments;
            if (attachments.Count != 0)
            {
                for (int i = 1; i <= mi.Attachments.Count; i++)
                {
                     //Save email attachments
                     mi.Attachments[i].SaveAsFile(@"C:\temp\" + mi.Attachments[i].FileName);

                     //Use TIKA to read the contents of the file
                     TextExtractionResult textExtractionResult = _textExtractor.Extract(@"C:\temp\" + mi.Attachments[i].FileName);

                    //Save attachment text to a txt file
                    File.AppendAllText(_attachmentTextFilepath, textExtractionResult.Text);
                }
            }
        }
    }
}

In VB.Net:

Private _textExtractor As TextExtractor
Private _attachmentTextFilepath As String = "c:\temp\EmailAttachmentText.txt"
Private Shared Sub IterateMessages(folder As Outlook.Folder)
    Dim fi = folder.Items
    If fi IsNot Nothing Then
        For Each item As [Object] In fi
            Dim mi As Outlook.MailItem = DirectCast(item, Outlook.MailItem)
            Dim attachments = mi.Attachments
            If attachments.Count <> 0 Then
                For i As Integer = 1 To mi.Attachments.Count
                    'Save email attachments
                    mi.Attachments(i).SaveAsFile("C:\temp\" + mi.Attachments(i).FileName)

                    'Use TIKA to read the contents of the file
                    Dim textExtractionResult As TextExtractionResult = _textExtractor.Extract("C:\temp\" + mi.Attachments(i).FileName)

                    'Save attachment text to a txt file
                    File.AppendAllText(_attachmentTextFilepath, textExtractionResult.Text)
                Next
            End If
        Next
    End If
End Sub

Upvotes: 2

Related Questions