ravi
ravi

Reputation: 797

Parsing email "Received:" headers

We need to parse Received: email headers according to RFC 5321. We need to extract domains or IPs through which the mail has traversed. Also, we need to figure out if an IP is an internal IP.

Is there already a library which can help out, preferably in C\C++?

For example:

Received: from server.mymailhost.com (mail.mymailhost.com [126.43.75.123])
    by pilot01.cl.msu.edu (8.10.2/8.10.2) with ESMTP id NAA23597;
    Fri, 12 Jul 2002 16:11:20 -0400 (EDT)

We need to extract the "by" server.

Upvotes: 8

Views: 7325

Answers (8)

Marcus Carey
Marcus Carey

Reputation: 33

typedef struct mailHeaders{
    char name[100];
    char value[2000];
}mailHeaders;

int header_count = 0;
mailHeaders headers[30];    // A struct to hold the name value pairs

char *GetMailHeader(char *name)
{
    char *value = NULL;;
    int i;

    for(i=0;i<header_count;i++){
        if(strcmp(name,headers[i].name) == 0){
            value = headers[i].value;
            break;
        }
    }
    return(value);
}


void ReadMail(void)
{

    //Loop through the email message line by line to separate the headers.  Then save the name value pairs to a linked list or struct.  
          char *Received = NULL // Received header
    char *mail = NULL;  // Buffer that has the email message.
    char *line = NULL;  // A line of text in the email. 
    char *name = NULL;  // Header name
    char *value = NULL; // Header value

    int index = -1;     // Header index


    memset(&headers,'\0',sizeof(mailHeaders));

    line = strtok(mail,"\n");
    while(line != NULL)
    {

        if(*line == '\t') // Tabbed headers
        {
            strcat(headers[index].value,line); // Concatenate the tabbed values
        }
        else
        {
            name = line;  
            value = strchr(line,':');  // Split the name value pairs.  
            if(value != NULL)
            {
                *value='\0';    // NULL the colon 
                value++;        // Move the pointer past the NULL character to separate the name and value
                index++;
                strcpy(headers[index].name,name);    // Copy the name to the data structure
                strcpy(headers[index].value,value);  // Copy the value to the data structure
            }

        }

        if(*line == '\r') // End of headers
            break;

        line = strtok(NULL,"\n"); // Get next header
        header_count = index;
    }

          Received = GetMailHeader("Received");

}

Upvotes: 0

karlcow
karlcow

Reputation: 6972

There is a Perl Received module which is a fork of the SpamAssassin code. It returns a hash for a Received header with the relevant information. For example

{ ip => '64.12.136.4', 
  id => '875522', 
  by => 'xxx.com',
  helo => 'imo-m01.mx.aol.com' }

Upvotes: 4

Ta01
Ta01

Reputation: 31630

You'll want to use Regular Expressions possibly

(?<=by).*(?=with)

This will give you pilot01.cl.msu.edu (8.10.2/8.10.2)

Edit: I find it amusing that this was modded down when it actually gets what the OP asked for.

C#:

string header = "Received: from server.mymailhost.com (mail.mymailhost.com [126.43.75.123]) by pilot01.cl.msu.edu (8.10.2/8.10.2) with ESMTP id NAA23597; Fri, 12 Jul 2002 16:11:20 -0400 (EDT)";
       System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(@"(?<=by).*(?=with)");
       System.Text.RegularExpressions.Match m = r.Match(header);
       Console.WriteLine(m.Captures[0].Value);
       Console.ReadKey();

I didnt claim that it was complete, but am wondering if the person that gave it a -1 even tried. Meh..

Upvotes: 1

bobince
bobince

Reputation: 536557

The format used by 'Received' lines is defined in RFC 2821, and regex can't parse it.

(You can try anyway, and for a limited subset of headers produced by known software you might succeed, but when you attach this to the range of strange stuff found in real-world mail it will fail.)

Use an existing RFC 2821 parser and you should be OK, but otherwise you should expect failure, and write the software to cope with it. Don't base anything important like a security system around it.

We need to extract the "by" server.

'from' is more likely to be of use. The hostname given in a 'by' line is as seen by the host itself, so there is no guarantee it will be a publically resolvable FQDN. And of course you don't tend to get valid (TCP-Info) there.

Upvotes: 5

Tiago
Tiago

Reputation: 9557

You can use regular expressions. It would look like this(not tested):

#include <regex.h>

regex_t *re = malloc(sizeof(regex_t));

const char *restr = "by ([A-Za-z.]+) \(([^\)]*)\)";

check(regcomp(re, restr, REG_EXTENDED | REG_ICASE), "regcomp");

size_t nmatch = 1;

regmatch_t *matches = malloc(sizeof(regmatch_t) * nmatch);

int ret = regexec(re, YOUR_STRING, nmatch, matches, 0);

check(ret != 0, "regexec");

int size;

size = matches[2].rm_eo - matches[2].rm_so;
char *host = malloc(sizeof(char) * size);
strncpy(host, YOUR_STRING + matches[2].rm_so, size );
host[size] = '\0';

size = matches[3].rm_eo - matches[3].rm_so;
char *ip = malloc(sizeof(char) * size);
strncpy(ip, YOUR_STRING + matches[3].rm_so, size );
ip[size] = '\0';

check is a macro to help you figure out if there are any problems:

#define check(condition, description) if (condition) { fprintf(stdout, "%s:%i - %s - %s\n", __FILE__, __LINE__, description, strerror(errno)); exit(1); }

Upvotes: 0

Axelle Ziegler
Axelle Ziegler

Reputation: 2655

vmime should be fine, moreless any mail library will allow you to do that.

Upvotes: 2

Dave Swersky
Dave Swersky

Reputation: 34810

Have you considered using regular expressions?

Here is a list of internal, non-routable address ranges.

Upvotes: -2

Keltia
Keltia

Reputation: 14743

It is not difficult to parse such headers, even manually line-by-line. A regex could help there by looking at by\s+(\w)+\(. For C++, you could try that library or that one.

Upvotes: -2

Related Questions