chmike
chmike

Reputation: 22174

MIME base64 encoding ambiguity in rfc2045

According to MIME base64 encoding specified in rfc2045, the base64 encoded data must be split in lines of at most 76 characters.

When decoding, all characters not belonging to the base64 alphabet must ne ignored.

How do we determine the end of MIME base64 encoded data ?

Upvotes: 1

Views: 590

Answers (1)

user2404501
user2404501

Reputation:

When you've found the start of a base64 encoded object, it should always be possible to find the end without decoding it. Examples:

  • You might have an email message whose top-level encoding is base64. In that case, the end of the base64 stuff is the end of the body. The end of the body is recognized not by any internal structure, but by the lone . at the end of the SMTP DATA.
  • If you're reading an email message from an mbox file instead of receiving it via SMTP, the mbox format is responsible for telling you where the end of the message is.
  • If you have a multipart email body with one part base64, you can scan for the multipart boundary first to find the end of the body part, then pass the whole body part to the base64 decoder.
  • Similarly, if you have an RFC2047-encoded header with base64, you can find the terminating =? first, then pass the encoded portion to the base64 decoder.

Because the terminators are already identified before base64 decoding begins, the decoder never sees the terminator, so the rule "characters not belonging to the base64 alphabet" is not relevant.

The 2 steps of finding the end of the base64 data and decoding can be combined into a single loop over the input, for efficiency. But conceptually they are separate.

Upvotes: 1

Related Questions