Reputation: 320
I'm using a Python 2.x-library email
to iterate over some .eml-files, but I have Python 3.x installed.
I extract the filename in the header of each payload (attachment) using .get_filename()
. Encoding is not set in the header and thus I believe Python 3.x interprets the returned string
as utf-8
. The string
however looks like this, when it contains special characters, e.g. like "ø":
=?ISO-8859-1?Q?Sp=F8rgeskema=2Edoc?=
I have failed in numerous ways to convert this string into utf-8
making it into bytes or not and de- and encoding using latin-1
, ISO-8859-1
(should be the same though) and utf-8
.
I've also tried using:
ast.literal_eval(r"b'=?ISO-8859-1?Q?Sp=F8rgeskema=2Edoc?='")
and decoding that, but it still returns the original string containing the encoded characters.
How do one go about this?
Upvotes: 0
Views: 679
Reputation: 9523
You are handling email, so you can use email handling functions:
Try with https://docs.python.org/3.5/library/email.header.html. The last example (and second one, very small module:
>>> from email.header import decode_header
>>> decode_header('=?iso-8859-1?q?p=F6stal?=')
[(b'p\xf6stal', 'iso-8859-1')]
There is also a version for python 2.7.
So for your case:
subj = '=?ISO-8859-1?Q?Sp=F8rgeskema=2Edoc?='
subject, encoder = email.header.decode_header(subj)[0]
print(subject.decode(encoder))
Upvotes: 2