PurpleVermont
PurpleVermont

Reputation: 1241

extracting email Received: headers with Python email package

I'd like to extract the Final Received: email header from a message. I have the Message as returned from email.message_from_file().

Using the Message.get() or Message.get_item() methods don't guarantee which of the many Received: headers I will get. Message.get_all() returns them all, but doesn't guarantee an order. Is there a way to be guaranteed to get the last one?

Upvotes: 2

Views: 6007

Answers (3)

jef79m
jef79m

Reputation: 136

In python 3.6.7, the comments on the get_all() method explicity state that the values are returned in the same order as they are in the message, so messageInstance.get_all('Received') should work fine.

def get_all(self, name, failobj=None):
    """Return a list of all the values for the named field.

    These will be sorted in the order they appeared in the original
    message, and may contain duplicates.  Any fields deleted and
    re-inserted are always appended to the header list.

    If no such fields exist, failobj is returned (defaults to None).
    """

Upvotes: 3

tripleee
tripleee

Reputation: 189327

The email.parser class HeaderParser implements a dictionary-like interface, but actually seems to return the headers in the order you expect.

from email.parser import HeaderParser

headers = HeaderParser().parse(open_filehandle, headersonly=True)
for key, value in headers.items():
    if key == 'Received':
        ... do things with the value

The parse method has a sister parsestr method which accepts a byte string instead of a file-like object.

If by "final" you mean the "newest", that will be the first one which matches the if so you can simply break after reading it. If by "final" you mean something else, you can implement that inside the if in whatever way you see fit.

This is adapted from this answer to a related question.

Upvotes: 2

hd1
hd1

Reputation: 34657

Received: headers are timestamped:

Received: from lb-ex1.int.icgroup.com (localhost [127.0.0.1])
by lb-ex1.localdomain (Postfix) with ESMTP id D6BDB1E26393
for <[email protected]>; Fri, 12 Dec 2014 12:09:24 -0500 (EST)

So, do messageInstance.get_all() and sort the resulting list however you see fit, an example of how to do this:

import email.utils
import operator
def sort_key(received_header):
    received_date = email.utils.parsedate_tz(received_header)
    return received_date

received_header_list.sort(key=sort_key)

If it doesn't work, do leave a comment and I'll be happy to look into it further.

Upvotes: 2

Related Questions