Ben Webb
Ben Webb

Reputation: 9

Some headers missing from net/mail parsed email

I'm writing a Go application that parses raw emails to return the content in plain text, and also a map of all the headers.

I've found that the net/mail package works pretty well for this, and have had some good results so far. However, it seems that some headers from the original email may be missing.

For example, in a relatively large multipart email I'm using in a test case, my unit test tells me that the "Message-ID" header is returned as empty even though I know it's defined in the original raw email.

Here's the relevant code that specifically handles parsing out the headers:

func extractAllHeaders(header mail.Header) (map[string][]string, error) {
    headers := make(map[string][]string)
    for k, v := range header {
        for _, val := range v {
            decodedValue, err := decodeHeader(val)
            if err != nil {
                return nil, err
            }
            headers[k] = append(headers[k], decodedValue)
        }
    }
    // for some reason Message-ID may not be found in the above loop, and we need to get it manually...?
    if _, exists := headers["Message-ID"]; !exists {
        headers["Message-ID"] = []string{header.Get("Message-ID")}
    }
    return headers, nil
}

func decodeHeader(header string) (string, error) {
    dec := new(mime.WordDecoder)
    return dec.DecodeHeader(header)
}

I thought that the for loop that goes over header (which is from the net/mail package, type map[string][]string) would effectively get all the existing headers - but apparently not.

Eventually, I tried just getting the header directly from the Header object using the Get method, and that does work!

So, at this point, I guess I have two questions:

  1. Why doesn't the for loop I have there already retrieve the "Message-ID" header (and some other headers besides this specific one), but Get seems to work fine?

  2. Is there a way to ensure I get all the headers that exist in an email? I don't think I can rely on Get since it requires I know all the key strings/header names, which I don't.

(If you want to see the raw email content, I can find a place to upload it, but it's pretty big so I left it out of this post for now.)

Upvotes: 0

Views: 86

Answers (1)

Steffen Ullrich
Steffen Ullrich

Reputation: 123320

Mail header fields are case-insensitive and Get is aware of this. Your code instead expects a specific spelling, i.e. Message-ID instead of the similar valid Message-Id, message-id etc.

This expectation is wrong, specifically range header returns the field names not how they were written in the mail but in a canonical form using CanonicalMIMEHeaderKey. This function "... converts the first letter and any letter following a hyphen to upper case; the rest are converted to lowercase.". This means you'll get Message-Id and not Message-ID as you expected. Get is aware of this and converts the input into the canonical form before checking the map.

Upvotes: 1

Related Questions