Reputation: 39
I am working with Microsoft Visual Studio 2015 and I am trying to convert something from list form to string form. I have found some solutions for similar problems but not this one in particular.
I would like to eventually have this code work:
Dim info As Byte() = New UTF8Encoding(True).GetBytes(Utilities.GetEmailInfo(msg.Message).Attachments)
My end game is to take the text from Attachments and write it to a file. If I use some of the other data types listed below such as ToData, the file turns out properly but I encounter an error with the code above because GetBytes cannot get the text from a list. Is there another function I could use to get the text from the list?
The class that I need to convert contains the following:
Public Class EmailInfo
Public FromData As String = vbNullString 'FROM:
Public ToData As String = vbNullString 'TO:
Public DateData As String = vbNullString 'DATE:
Public SubjectData As String = vbNullString 'SUBJECT:
Public MessageBody As EmailItem 'contents of message body
Public AlternateViews As New Collections.Generic.List(Of EmailItem) 'list of alternate views
Public Attachments As New Collections.Generic.List(Of EmailItem) 'list of attachments
End Class
The resource that I want to access is EmailInfo.Attachments. This resource is stored as a list of type EmailItem. The code for this type is as follows:
Public Class EmailItem
Public ContentType As String = vbNullString 'CONTENT-TYPE data
Public ContentTypeData As String = vbNullString 'filename or text encoding
Public ContentTypeDataIsFilename As Boolean = False 'True if ContentTypeData specifies a filename
Public ContentEncoding As String = vbNullString 'CONTENT-TRANSFER-ENCODING data
Public ContentBody As String = vbNullString 'raw data of block
End Class
I have tried using some code such as String.Join but I end up with a blank string.
Please pardon my ignorance as I am new to VB.
Thank you all for all of your help!
Ryan
Upvotes: 0
Views: 3329
Reputation: 65594
This no trivial task. Attachments could be any number of proprietary formats: ".pdf", ".doc", ".xls", ".ppt", ".csv", ".vsd", ".zip", ".rar", ".txt", ".html", ".proj", etc, etc , etc.
Good news is all the work has already been done for you and I will show you how to generically read almost any file format and extract the text in this answer:
Generically read any file format and convert it to .txt format
Make sure you read the Info to set it up paragraph.
So go ahead and reference TikaOnDotnet & TikaOnDotnet.TextExtractor to your project using NuGet (Tools menu > NuGet Package Manager).
I am assuming you have written code to extract the email attachments, using an Outlook Add-In MSDN How to: Programmatically Save Attachments from Outlook E-Mail Items or just an app that uses Outlook via Interop, eg:
In C#:
private TextExtractor _textExtractor;
private string _attachmentTextFilepath = @"c:\temp\EmailAttachmentText.txt";
static void IterateMessages(Outlook.Folder folder)
{
var fi = folder.Items;
if (fi != null)
{
foreach (Object item in fi)
{
Outlook.MailItem mi = (Outlook.MailItem)item;
var attachments = mi.Attachments;
if (attachments.Count != 0)
{
for (int i = 1; i <= mi.Attachments.Count; i++)
{
//Save email attachments
mi.Attachments[i].SaveAsFile(@"C:\temp\" + mi.Attachments[i].FileName);
//Use TIKA to read the contents of the file
TextExtractionResult textExtractionResult = _textExtractor.Extract(@"C:\temp\" + mi.Attachments[i].FileName);
//Save attachment text to a txt file
File.AppendAllText(_attachmentTextFilepath, textExtractionResult.Text);
}
}
}
}
}
In VB.Net:
Private _textExtractor As TextExtractor
Private _attachmentTextFilepath As String = "c:\temp\EmailAttachmentText.txt"
Private Shared Sub IterateMessages(folder As Outlook.Folder)
Dim fi = folder.Items
If fi IsNot Nothing Then
For Each item As [Object] In fi
Dim mi As Outlook.MailItem = DirectCast(item, Outlook.MailItem)
Dim attachments = mi.Attachments
If attachments.Count <> 0 Then
For i As Integer = 1 To mi.Attachments.Count
'Save email attachments
mi.Attachments(i).SaveAsFile("C:\temp\" + mi.Attachments(i).FileName)
'Use TIKA to read the contents of the file
Dim textExtractionResult As TextExtractionResult = _textExtractor.Extract("C:\temp\" + mi.Attachments(i).FileName)
'Save attachment text to a txt file
File.AppendAllText(_attachmentTextFilepath, textExtractionResult.Text)
Next
End If
Next
End If
End Sub
Upvotes: 2