Parsing Email Headers Tabs

Question

I am parsing E-Mails with the Python email module.

If I parse it with the Python E-Mail parser, it does not remove the tab in front of the header items:

from email.parser import Parser
from email.policy import default

testmail = """Date: Wed, 26 Jan 2022 10:45:29 +0100
Message-ID:
    <123123123123123123123123123123123123123.testinst.themultiverse.com>
Subject:
    =?iso-8859-1?Q?Auftragsbest=E4tigung_blablabla?=
 =?iso-8859-1?Q?_one nice thing?=

Content Body Whatnot"""


message = Parser(policy=default).parsestr(testmail)

print(repr(message["Message-Id"]))
print(repr(message["Subject"]))

results in:

'	<123123123123123123123123123123123123123.testinst.themultiverse.com>'
'	Auftragsbestätigung blablabla one nice thing'

I have tried the different policies of the email parser, but I do not manage to remove the tab in the beginning. I saw the header_source_parse method of the EmailPolicy class does strip the whitespace, but only in combination with a space in the beginning.

/email/policy.py:

[...]
        value = value.lstrip(' 	') + ''.join(sourcelines[1:])
[...]

Not sure if that is intended behavior or a bug.

My question now: Is there a way in the standard library to do this, or do I need to write a custom policy? The E-Mails are unchanged from an IMAP Server (exchange) and it feels strange that the standard tools do not cover this.

Parsing Email Headers Tabs

Answers (1)

Related Questions