donald
donald

Reputation: 23737

Node.js: Regular expressions to get e-mail headers and body

I know very little about regular expressions and I'm having trouble getting the information I need from an e-mail so I'd like your help reading the fields: "status", "to", "from", "subject" and "body".

The e-mail has failed, details:

Action: failed
Status: 5.0.0 (permanent failure)

---------- Forwarded message ----------
From: [email protected]
To: [email protected]
Date: Tue, 12 Apr 2011 13:55:23 +0000
Subject: test
hellloooooo

What's the best way to do it using JavaScript?

Thanks

Upvotes: 0

Views: 2008

Answers (2)

josh3736
josh3736

Reputation: 144912

A regular expression is probably not the best tool for this job. What you really want is a library that properly parses RFC 2822 email messages, especially since you want to extract the body – if you look at the spec, you'll see that there's a lot of complexity involved in parsing an email (text encodings, MIME, etc.)

Using mailparser:

var mailparser = require("./mailparser"),
    fs = require("fs"),
    sys = require("sys");

fs.readFile('mail.txt', function (err, data) {
    if (err) throw err;

    var mp = new mailparser.MailParser();

    // callback for the headers object
    mp.on("headers", function(headers){
        console.log("HEADERS");
        console.log(sys.inspect(headers, false, 5));
    });

    // callback for the body object
    mp.on("body", function(body){
        console.log("BODY");
        console.log(sys.inspect(body, false, 7));
    });

    mp.feed(data.toString("ascii"));
    mp.end();
});

Upvotes: 3

Raynos
Raynos

Reputation: 169391

Assuming that these fields are as simple and consistent as

[\n] From: [...][\n]

then an expression like

/[\n]( From: ).+[\n]/

Would work for you. Replace ( From: ) with ( Date: ) etc.

And use string.match(regExp)

Update:

var bodyRegex = /[\n] Subject: (.+)[\n](.+)/
var string = ...;
var result = string.match(bodyRegex);
result[1]; // Subject
result[2]; // Body

Upvotes: 0

Related Questions