Zeedox
Zeedox

Reputation: 80

Are newlines in MIME headers using encoded-words legal?

RFC 2047 defines the encoded-words mechanism for encoding non-ASCII character in MIME documents. It specifies that whitespace characters (space and tabs) are not allowed inside the encoded-word.

However, RFC 5322 for parsing email MIME documents specifies that long header lines should be "folded". Should this folding take place before or after encoded-words decoding?

I recently received an email where encoded-text part of the header had a newline in it, like this:

Header: =?UTF-8?Q?=C3=A5
 =C3=A4?=

Would this be valid?

Of course emails can be invalid in lots of exciting ways and the parser needs to handle that, but it's interesting to know the "correct" way. :)

Upvotes: 2

Views: 889

Answers (1)

Cupcake Protocol
Cupcake Protocol

Reputation: 691

I misread the question and answered as if it was a different sort of whitespace. In this case the white space appears inside the MIME word, not multiple ones separated by white space.

This sort of thing is explicitly disallowed. From the introduction to the format in RFC2047:

2. Syntax of encoded-words

   An 'encoded-word' is defined by the following ABNF grammar.  The
   notation of RFC 822 is used, with the exception that white space
   characters MUST NOT appear between components of an 'encoded-word'.

And then later on in the same section:

   IMPORTANT: 'encoded-word's are designed to be recognized as 'atom's
   by an RFC 822 parser.  As a consequence, unencoded white space
   characters (such as SPACE and HTAB) are FORBIDDEN within an
   'encoded-word'.  For example, the character sequence

      =?iso-8859-1?q?this is some text?=

   would be parsed as four 'atom's, rather than as a single 'atom' (by
   an RFC 822 parser) or 'encoded-word' (by a parser which understands
   'encoded-words').  The correct way to encode the string "this is some
   text" is to encode the SPACE characters as well, e.g.

      =?iso-8859-1?q?this=20is=20some=20text?=

   The characters which may appear in 'encoded-text' are further
   restricted by the rules in section 5.

Earlier answer

This sort of thing is explicitly allowed. Headers with MIME words should be 76 characters or less and folded if needed. RFC822 folded headers are indented second and any additional lines. RFC2047 headers are supposed to only indent one space. The whitespace between ?= on the first line and =? should be suppressed from output.

See the example on the bottom of page 12 of the RFC:

encoded form                                displayed as
---------------------------------------------------------------------
(=?ISO-8859-1?Q?a?=                         (ab)
   =?ISO-8859-1?Q?b?=)

       Any amount of linear-space-white between 'encoded-word's,
       even if it includes a CRLF followed by one or more SPACEs,
       is ignored for the purposes of display.

Upvotes: 2

Related Questions