Reputation: 797
We need to parse Received:
email headers according to RFC 5321. We need to extract domains or IPs through which the mail has traversed. Also, we need to figure out if an IP is an internal IP.
Is there already a library which can help out, preferably in C\C++?
For example:
Received: from server.mymailhost.com (mail.mymailhost.com [126.43.75.123])
by pilot01.cl.msu.edu (8.10.2/8.10.2) with ESMTP id NAA23597;
Fri, 12 Jul 2002 16:11:20 -0400 (EDT)
We need to extract the "by" server.
Upvotes: 8
Views: 7325
Reputation: 33
typedef struct mailHeaders{
char name[100];
char value[2000];
}mailHeaders;
int header_count = 0;
mailHeaders headers[30]; // A struct to hold the name value pairs
char *GetMailHeader(char *name)
{
char *value = NULL;;
int i;
for(i=0;i<header_count;i++){
if(strcmp(name,headers[i].name) == 0){
value = headers[i].value;
break;
}
}
return(value);
}
void ReadMail(void)
{
//Loop through the email message line by line to separate the headers. Then save the name value pairs to a linked list or struct.
char *Received = NULL // Received header
char *mail = NULL; // Buffer that has the email message.
char *line = NULL; // A line of text in the email.
char *name = NULL; // Header name
char *value = NULL; // Header value
int index = -1; // Header index
memset(&headers,'\0',sizeof(mailHeaders));
line = strtok(mail,"\n");
while(line != NULL)
{
if(*line == '\t') // Tabbed headers
{
strcat(headers[index].value,line); // Concatenate the tabbed values
}
else
{
name = line;
value = strchr(line,':'); // Split the name value pairs.
if(value != NULL)
{
*value='\0'; // NULL the colon
value++; // Move the pointer past the NULL character to separate the name and value
index++;
strcpy(headers[index].name,name); // Copy the name to the data structure
strcpy(headers[index].value,value); // Copy the value to the data structure
}
}
if(*line == '\r') // End of headers
break;
line = strtok(NULL,"\n"); // Get next header
header_count = index;
}
Received = GetMailHeader("Received");
}
Upvotes: 0
Reputation: 6972
There is a Perl Received module which is a fork of the SpamAssassin code. It returns a hash for a Received
header with the relevant information. For example
{ ip => '64.12.136.4',
id => '875522',
by => 'xxx.com',
helo => 'imo-m01.mx.aol.com' }
Upvotes: 4
Reputation: 31630
You'll want to use Regular Expressions possibly
(?<=by).*(?=with)
This will give you pilot01.cl.msu.edu (8.10.2/8.10.2)
Edit: I find it amusing that this was modded down when it actually gets what the OP asked for.
C#:
string header = "Received: from server.mymailhost.com (mail.mymailhost.com [126.43.75.123]) by pilot01.cl.msu.edu (8.10.2/8.10.2) with ESMTP id NAA23597; Fri, 12 Jul 2002 16:11:20 -0400 (EDT)";
System.Text.RegularExpressions.Regex r = new System.Text.RegularExpressions.Regex(@"(?<=by).*(?=with)");
System.Text.RegularExpressions.Match m = r.Match(header);
Console.WriteLine(m.Captures[0].Value);
Console.ReadKey();
I didnt claim that it was complete, but am wondering if the person that gave it a -1 even tried. Meh..
Upvotes: 1
Reputation: 536557
The format used by 'Received' lines is defined in RFC 2821, and regex can't parse it.
(You can try anyway, and for a limited subset of headers produced by known software you might succeed, but when you attach this to the range of strange stuff found in real-world mail it will fail.)
Use an existing RFC 2821 parser and you should be OK, but otherwise you should expect failure, and write the software to cope with it. Don't base anything important like a security system around it.
We need to extract the "by" server.
'from' is more likely to be of use. The hostname given in a 'by' line is as seen by the host itself, so there is no guarantee it will be a publically resolvable FQDN. And of course you don't tend to get valid (TCP-Info) there.
Upvotes: 5
Reputation: 9557
You can use regular expressions. It would look like this(not tested):
#include <regex.h>
regex_t *re = malloc(sizeof(regex_t));
const char *restr = "by ([A-Za-z.]+) \(([^\)]*)\)";
check(regcomp(re, restr, REG_EXTENDED | REG_ICASE), "regcomp");
size_t nmatch = 1;
regmatch_t *matches = malloc(sizeof(regmatch_t) * nmatch);
int ret = regexec(re, YOUR_STRING, nmatch, matches, 0);
check(ret != 0, "regexec");
int size;
size = matches[2].rm_eo - matches[2].rm_so;
char *host = malloc(sizeof(char) * size);
strncpy(host, YOUR_STRING + matches[2].rm_so, size );
host[size] = '\0';
size = matches[3].rm_eo - matches[3].rm_so;
char *ip = malloc(sizeof(char) * size);
strncpy(ip, YOUR_STRING + matches[3].rm_so, size );
ip[size] = '\0';
check is a macro to help you figure out if there are any problems:
#define check(condition, description) if (condition) { fprintf(stdout, "%s:%i - %s - %s\n", __FILE__, __LINE__, description, strerror(errno)); exit(1); }
Upvotes: 0
Reputation: 2655
vmime should be fine, moreless any mail library will allow you to do that.
Upvotes: 2
Reputation: 34810
Have you considered using regular expressions?
Here is a list of internal, non-routable address ranges.
Upvotes: -2