Reputation: 5142
I'm trying to extract the HTML email bodies from Outlook msg files. I've successfully converted them to eml/standard RFC 822 files using email-outlook-message-perl, but the body of the emails are HTML wrapped in RTF. Here's an example snipit:
{\*\htmltag96 <div class="EduText" style="padding:2px;border-width:1px;background-color:#DEE5ED;border-color:##FAFAFA;border-style:solid;">}\htmlrtf {\htmlrtf0 {\*\htmltag64}\htmlrtf {\htmlrtf0 \htmlrtf{\f4\fs24\htmlrtf0 \'cd\'d5\'e0\'c1\'c5\'b9\'d5\'e9\'ca\'e8\'a7\'e4\'bb\'b7\'d5\'e8 john.smith\htmlrtf\f0}\htmlrtf0
{\*\htmltag116 <br>}\htmlrtf \line
\htmlrtf0
Is there a way to get the the HTML content, without all of the RTF crud?
Upvotes: 1
Views: 1452
Reputation: 475
This is a few years old back thread, but this might be helpful for one who is new to TNEF and he is in similar situation...
If you are a Linux user, then you could extract the html content from rtf file using Linux command line tool unrtf
unrtf message.rtf
This will give you the output with html content.
If you want to redirect it into a file, then could try unrtf message.rtf > message.html
Hope this helps...
-Suresh
Upvotes: 1
Reputation: 2658
Microsoft is using TNEF (Transport Neutral Encapsulation Format). So I think you need to search for a TNEF Phyton implementation like:
Upvotes: 0