Reputation: 1241
Using the email.header package, I can do
the_text,the_charset = decode_header(inputText)
to get the character set of the email header, where the inputText was retrieved by a command like
inputText = msg.get('From')
to use the From: header as an example.
in order to extract the header encoding for that header, do I have to do something like this?:
the_header_encoding = email.charset.Charset(the_charset).header_encoding
That is, do I have to create an instance of the Charset class based on the name of the charset (and would that even work?), or is there a way to extract the header encoding more directly from the header itself?
Upvotes: 0
Views: 244
Reputation: 1122152
Encoded-Message header can consist of 1 or more lines, and each line can use a different encoding, or no encoding at all.
You'll have to parse the type of encoding out yourself, one per line. Using a regular expression:
import re
quopri_entry = re.compile(r'=\?[\w-]+\?(?P<encoding>[QB])\?[^?]+?\?=', flags=re.I)
encodings = {'Q': 'quoted-printable', 'B': 'base64'}
def encoded_message_codecs(header):
used = []
for line in header.splitlines():
entry = quopri_entry.search(line)
if not entry:
used.append(None)
continue
used.append(encodings.get(entry.group('encoding').upper(), 'unknown'))
return used
This returns a list of strings drawn from quoted-printable
, base64
, unknown
or None
if no Encoded-Message was used for that line.
Upvotes: 1