Reputation: 69981
I have a little python script that pulls emails from a POP mail address and dumps them into a file (one file one email)
Then a PHP script runs through the files and displays them.
I am having an issue with ISO-8859-1 (Latin-1) encoded email
Here's an example of the text i get: =?iso-8859-1?Q?G=EDsli_Karlsson?= and Sj=E1um hva=F0 =F3li er kl=E1r J
The way i pull emails is this code.
pop = poplib.POP3(server)
mail_list = pop.list()[1]
for m in mail_list:
mno, size = m.split()
lines = pop.retr(mno)[1]
file = StringIO.StringIO("\r\n".join(lines))
msg = rfc822.Message(file)
body = file.readlines()
f = open(str(random.randint(1,100)) + ".email", "w")
f.write(msg["From"] + "\n")
f.write(msg["Subject"] + "\n")
f.write(msg["Date"] + "\n")
for b in body:
f.write(b)
I have tried probably all combinations of encode / decode within python and php.
Upvotes: 1
Views: 4122
Reputation: 1
I encountered same issue, below is my solution, it's work for me. and please import email.header in your python file.
def get_decoded_mailsubject(strsubject):
encoded_string = strsubject
decoded_parts = []
for part, encoding in email.header.decode_header(encoded_string):
if isinstance(part, bytes):
try:
decoded_parts.append(part.decode(encoding or 'utf-8'))
except UnicodeDecodeError:
decoded_parts.append(part.decode('latin1'))
else:
decoded_parts.append(part)
decoded_string = ''.join(decoded_parts)
return decoded_string
Upvotes: 0
Reputation: 69981
There is a better way to do this, but this is what i ended up with. Thanks for your help guys.
import poplib, quopri
import random, md5
import sys, rfc822, StringIO
import email
from email.Generator import Generator
user = "[email protected]"
password = "password"
server = "mail.example.com"
# connects
try:
pop = poplib.POP3(server)
except:
print "Error connecting to server"
sys.exit(-1)
# user auth
try:
print pop.user(user)
print pop.pass_(password)
except:
print "Authentication error"
sys.exit(-2)
# gets the mail list
mail_list = pop.list()[1]
for m in mail_list:
mno, size = m.split()
message = "\r\n".join(pop.retr(mno)[1])
message = email.message_from_string(message)
# uses the email flatten
out_file = StringIO.StringIO()
message_gen = Generator(out_file, mangle_from_=False, maxheaderlen=60)
message_gen.flatten(message)
message_text = out_file.getvalue()
# fixes mime encoding issues (for display within html)
clean_text = quopri.decodestring(message_text)
msg = email.message_from_string(clean_text)
# finds the last body (when in mime multipart, html is the last one)
for part in msg.walk():
if part.get_content_type():
body = part.get_payload(decode=True)
filename = "%s.email" % random.randint(1,100)
email_file = open(filename, "w")
email_file.write(msg["From"] + "\n")
email_file.write(msg["Return-Path"] + "\n")
email_file.write(msg["Subject"] + "\n")
email_file.write(msg["Date"] + "\n")
email_file.write(body)
email_file.close()
pop.quit()
sys.exit()
Upvotes: 2
Reputation: 35459
That's the MIME encoding of headers, RFC 2047. Here is how to decode it in Python:
import email.Header
import sys
header_and_encoding = email.Header.decode_header(sys.stdin.readline())
for part in header_and_encoding:
if part[1] is None:
print part[0],
else:
upart = (part[0]).decode(part[1])
print upart.encode('latin-1'),
print
More detailed explanations (in French) in http://www.bortzmeyer.org/decoder-en-tetes-courrier.html
Upvotes: 2
Reputation: 1606
You can use the python email library (python 2.5+) to avoid these problems:
import email
import poplib
import random
from cStringIO import StringIO
from email.generator import Generator
pop = poplib.POP3(server)
mail_count = len(pop.list()[1])
for message_num in xrange(mail_count):
message = "\r\n".join(pop.retr(message_num)[1])
message = email.message_from_string(message)
out_file = StringIO()
message_gen = Generator(out_file, mangle_from_=False, maxheaderlen=60)
message_gen.flatten(message)
message_text = out_file.getvalue()
filename = "%s.email" % random.randint(1,100)
email_file = open(filename, "w")
email_file.write(message_text)
email_file.close()
This code will get all the messages from your server and turn them into Python message objects then flatten them out into strings again for writing to the file. By using the email package from the Python standard library MIME encoding and decoding issues should be handled for you.
DISCLAIMER: I have not tested that code, but it should work just fine.
Upvotes: 3
Reputation: 14743
Until very recently, plain Latin-N or utf-N were no allowed in headers which means that they would get to be encoded by a method described at first in RFC-1522 but it has been superseded later. Accents are encoded either in quoted-printable or in Base64 and it is indicated by the ?Q? (or ?B? for Base64). You'll have to decode them. Oh and space is encoded as "_". See Wikipedia.
Upvotes: 1
Reputation: 340171
That's MIME content, and that's how the email actually looks like, not a bug somewhere. You have to use a MIME decoding library (or decode it yourself manually) on the PHP side of things (which, if I understood correctly, is the one acting as email renderer).
In Python you'd use mimetools. In PHP, I'm not sure. It seems the Zend framework has a MIME parser somewhere, and there are probably zillions of snippets floating around.
http://en.wikipedia.org/wiki/MIME#Encoded-Word
Upvotes: 0