Brad Solomon
Brad Solomon

Reputation: 40908

Parsing names with commas from email module's `parseaddr`

email.utils.parseaddr doesn't seem to be able to handle cases where the name is listed in lastname, firstname format (a format that is common in email metadata).

Example:

>>> import email.utils

>>> email.utils.parseaddr('Joe A. Smith <[email protected]>')  # OK
('Joe A. Smith', '[email protected]')

>>> email.utils.parseaddr('Smith, Joe A. <[email protected]>')  # Fails
('', 'Smith')

Is this intentionally designed? email purports to follow RFC 2822. The spec for the full string is defined as

angle-addr      =       [CFWS] "<" addr-spec ">" [CFWS] / obs-angle-addr

But's its unclear to me what can constitute "CFWS." Is the return type ('', 'Smith') compliant with the RFC?


Version info:

>>> sys.version_info
sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)

Upvotes: 0

Views: 323

Answers (1)

jwodder
jwodder

Reputation: 57590

As defined in section 3.2.3 of the RFC, CFWS is whitespace & comments, so it does not apply here. You want to look at the following definitions, scattered throughout the grammar:

name-addr       =       [display-name] angle-addr
display-name    =       phrase
phrase          =       1*word / obs-phrase
word            =       atom / quoted-string
atom            =       [CFWS] 1*atext [CFWS]
atext           = [a bunch of characters not including comma]
obs-phrase      =       word *(word / "." / CFWS)

From this, we can see that 'Joe A. Smith <[email protected]>' is valid because Joe A. Smith is an obs-phrase, but 'Smith, Joe A. <[email protected]>' is not valid because commas aren't allowed in an atom or obs-phrase. Instead, you must use a quoted-string:

>>> email.utils.parseaddr('"Smith, Joe A." <[email protected]>')
('Smith, Joe A.', '[email protected]')

Upvotes: 4

Related Questions