Jef
Jef

Reputation: 811

Parse text/html data with JavaMail

I wrote an application where I fetch a message and check it's content:

public void getInhoud(Message msg) throws IOException, Exception {
    Object contt = msg.getContent();
    ...
    if (contt instanceof String) {
          handlePart((Part) msg);
    }
    ...
}

public void handlePart(Part part)
        throws MessagingException, IOException, Exception {

    ByteArrayOutputStream out = new ByteArrayOutputStream();
    String contentType = part.getContentType();
    ...
    if ((contentType.length() >= 9)
            && (contentType.toLowerCase().substring(
            0, 9).equals("text/html"))) {
        part.writeTo(out);
        String stringS = out.toString();
    }
    ...
}

I removed the unnecessary code. This methods works for e-mail which was sent from Gmail, Hotmail and the Outlook desktop client, but somehow fails to work with e-mails which were sent from the Office 365 web client. For every other client the content type will be 'plain/text' but only for Office 365 mail it will be text/html. It is writing the data of the Part to an ByteArrayOutputStream which then will be converted to a String. This works, well atleast the String will contain the content of the part. But the HTML it contains is somewhat faulty.

Here is an example: http://pastebin.com/5mEYCHxD (posted to Pastebin, it is pretty big).

Notice the = symbols which are printed at the end of almost every line. Is this something I can fix within in the code or should it be somewhere in the mailclient?

I thought about looping trough every line of HTML and removing the = after having checked it is not a part an HTML tag.

Any help is very much appreciated, this has been bothering me for a few weeks now.

Thanks!

Upvotes: 1

Views: 2720

Answers (1)

Jörn Horstmann
Jörn Horstmann

Reputation: 34024

That sounds just like quoted printable encoding:

Lines of quoted-printable encoded data must not be longer than 76 characters. To satisfy this requirement without altering the encoded text, soft line breaks may be added as desired. A soft line break consists of an "=" at the end of an encoded line, and does not appear as a line break in the decoded text.

The writeTo method seems to also write the encoded content, it seems you have to copy the streams yourself. The getInputStream method is described as returning the decoded InputStream.

Upvotes: 1

Related Questions