Neel S
Neel S

Reputation: 165

How to parse contained email in text file using regex?

enter image description here

I have text file containing text from email as mentioned above . I need to extract values for E2, E1, E0 and each of those for From:, Sent: , To: , Subject: Can we do it using regex expression ?

we can do like "^(From|Sent|To|Subject):(.*)" regex in java . But is there any comprehensive regex for above example of text ?

Upvotes: 0

Views: 187

Answers (2)

Serge Ballesta
Serge Ballesta

Reputation: 148880

Unsure if related, but some mail readers (thunderbird among others) store the mails in a text file with a determined format :

  • separator line begins with From (ie From followed by a space and not a column) : this line is the begin of a mail
  • separator is followed by header lines. Each header line has a format of : HEADERNAME: value where both HEADERNAME and value are arbitrary strings. The rule is that HEADERNAME must not be preceded by a space because a line beginning by a space is a continuation line
  • the headers bloc is terminated by an empty line
  • the remaining upto next From line is the body of the mail

If you are reading such a file, I strongly advice you to not rely on known HEARDERNAMES but to parse the file according to the above rules, or even better use the mailbox module that will do that for you and :

  • has been thoroughly tested
  • has many options to adapt to variations on mailbox format

Upvotes: 1

MiiinimalLogic
MiiinimalLogic

Reputation: 818

Take a look at the raw message source. You'll see there should be a uniform first header and there's always and always, only, a blank line separating the headers from the actual message (the part you want).

You can create a regex to look for the first blank line after the first header, then extract the body.

Upvotes: 1

Related Questions