Alexander Remesch
Alexander Remesch

Reputation: 623

How to retrieve the Content-Type and the Content-Disposition from an outlook msg file using Apache POI-HSMF?

I need to write a Java program to extract all attachments from messages saved by Outlook 2016 in the native msg format. The program should skip inline images. Also some of the mails have multipart/alternative parts where the program should retrieve the "best" content-type, e.g. text/html over text/plain.

In order to do that, I need to find out the content-type and content-disposition of all parts and attachments of the message.

I tried the following:

public static void main(String[] args) throws IOException {
    String mfile = "test/test2.msg";
    MAPIMessage msg = new MAPIMessage(mfile);

    AttachmentChunks[] attachments = msg.getAttachmentFiles();
    if (attachments.length > 0) {
        for (AttachmentChunks attachment : attachments) {
            System.out.println("long file name = " + attachment.getAttachLongFileName());
            System.out.println("content id = " + attachment.getAttachContentId());
            System.out.println("mime tag = " + attachment.getAttachMimeTag());
            System.out.println("embedded = " + attachment.isEmbeddedMessage());
        }
    }
    msg.close();
}

The problem is, that the "mime tag" (i.e. the content-type) is returned only for some attachments and returns null for all others. The content-disposition seems to be totally missing.

For example, I get the following output on a mail saved by OL2016 (the mail contains a PDF attachment and an inline logo image):

long file name = Vertretungsvollmacht Übersiedlung.pdf
content id = null
mime tag = null
embedded = false
long file name = image001.jpg
content id = [email protected]
mime tag = image/jpeg
embedded = false

Is there a way to get these attributes out of the msg files or is there a more complete & convenient way to achieve what I want in Java with some other library than Apache POI-HSMF?

Upvotes: 1

Views: 1711

Answers (2)

Dmitry Streblechenko
Dmitry Streblechenko

Reputation: 66276

The fact that an attachment has a content-id tag does not mean it is an embedded image - Lotus Notes adds content-id to all attachments. The only valid check is to load the HTML body and figure out what the <img> tags refer to.

Upvotes: 0

Alexander Remesch
Alexander Remesch

Reputation: 623

In order to get the content-disposition (inline or attachment), I did the following:

    String disposition = "attachment";
    if (contentId != "")
        if (body.contains(contentId.toString()))
            disposition = "inline";

To obtain the content-type, I have derived it from the file extension of the attachment, e.g.:

        String ext = fileNameOri.substring(fileNameOri.lastIndexOf(".") + 1);
        switch (ext.toLowerCase()) {
        case "xlsx": 
            ct = "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet";
            break;
        }

A list of mime types can be obtained from e.g. https://wiki.selfhtml.org/wiki/MIME-Type/%C3%9Cbersicht

Of course, this should only be done in case AttachmentChunks.getAttachMimeTag() returns an empty string.

Upvotes: 1

Related Questions