Tamazin
Tamazin

Reputation: 103

PHP IMAP How to get just the text-part of body? Not the different <html> tags etc

I'm trying to write a script that downloads email from an exchange server and later inserts that into an database, but I'm having trouble getting the 'text part' of the emails in a good way.

phpcode

<?PHP
$user = "[email protected]";
$password = "password123";
$mbox = imap_open("{exchange01:993/imap/ssl/novalidate-cert}", $user, $password);

$message = imap_fetchbody($mbox,1,1);

print_r($message);

if($mbox)
{
    imap_close($mbox);
};
?>

and the entire html body gets printed. I guess thats to be expected, but I'd like to not have the

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=iso-8859-1"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
    {font-family:"Cambria Math";
    panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
    {font-family:Calibri;
    panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
    {font-family:Verdana;
    panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
    {font-family:"Neo Sans Std";}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
    {margin:0cm;
    margin-bottom:.0001pt;
    font-size:11.0pt;
    font-family:"Calibri",sans-serif;
    mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
    {mso-style-priority:99;
    color:#0563C1;
    text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
    {mso-style-priority:99;
    color:#954F72;
    text-decoration:underline;}
span.E-postmall17

....mumbojumbo, just the text in the email itself (I can live with having signature and images and this and that).

Is there no easier way than somewhat roughly cutting the long string up at <body... to </body... and then cutting it further from there? There must be other people who've wanted to solve the same problem but I'm unable to find any answer after spending an entire day trying to solve it and google:ing it.

I guess in the end I'll just insert the entire htmlresponse into the database cell and hope for the best, but I'd rather not.

Help me, Stackoverflow. You're my only hope

Solution edit:

Not the exact solution I would've liked, but it does work (with some slight fixing to do).

echo strip_tags($message, '<body>');

Outputs just the

<body...>
Yayh the text i want!
</body .....>

part. Thanks alot @ThisGuyHasTwoThumbs (In comments)

Edit:

In the end the code became roughly this

<?PHP
$user = "[email protected]";
$password = "password";
$mbox = imap_open("{exchange01:993/imap/ssl/novalidate-cert}", $user, $password);

$message = imap_fetchbody($mbox,1,1);

$message = strip_tags($message, '<body>');
$message = explode(">", $message);
$message = explode("<", $message[1]);
$message = str_replace("&nbsp;", "", $message[0]);
$message = html_entity_decode($message);
$message = trim($message);
//Or the above three combined in one row
#$message = trim(html_entity_decode( str_replace("&nbsp;", "", $message[0])));

echo $message;

if($mbox)
{
    imap_close($mbox);
};
?>

Which removes the first <body something something something> and the </body> at the end and after that removes the whitespace in the beginning and end of the variable. (Which @Goose also kinda answered in his edited answer below). It also converts html-encoded 'letters to the corresponding ones as well as removes the &nbsp tags and such.

Upvotes: 3

Views: 5391

Answers (3)

Techospert_Deepak
Techospert_Deepak

Reputation: 1

You can get email body into plain text with below detail:

$ClearText = preg_replace( "/\n\s+/", "\n", rtrim(html_entity_decode(strip_tags($body))) );

Upvotes: 0

Dave
Dave

Reputation: 3288

do $message = imap_fetchbody($mbox,1,1.1);

will give you the plain text part of the message not the entire body contents or use 1.2 if you want the html part

(empty) - Entire message 0 - Message header 1 - MULTIPART/ALTERNATIVE 1.1 - TEXT/PLAIN 1.2 - TEXT/HTML 2 - MESSAGE/RFC822 (entire attached message) 2.0 - Attached message header 2.1 - TEXT/PLAIN 2.2 - TEXT/HTML 2.3 - file.ext

as per the 2nd comment on http://php.net/manual/en/function.imap-fetchbody.php it also had some nice functions in there for dynamically calculating the available message parts for you so you don't have to worry too much about what type of message and data it is.

Upvotes: 2

Goose
Goose

Reputation: 4821

What you want is strip_tags()

http://php.net/manual/en/function.strip-tags.php

$html = '<div>hello</div>';
$text = strip_tags($html);
echo $text; // hello

If you need to remove excess white space from the resulting string, use this. This will also remove new lines. Credit to Remove excess whitespace from within a string

$text = preg_replace('/\s+/', ' ', $text);

Upvotes: 1

Related Questions