Michael B
Michael B

Reputation: 1763

Regex - match emails in filenames, return list of email addresses

I have a directory of text files like so:

listedname_ [email protected]__subject_date.eml

The filenames are guaranteed to have this pattern during the "email" portion:

[email protected]_

So, email, at-sign, domain, period, tld, underscore. (They will not always have a leading underscore, due to not everyone setting their "name" in their email client.)

There are ~1,000 files in a directory on Windows, though I have Cygwin tools installed and can navigate to the directory. The file contents also have a line in it guaranteed to look like this:

From: "Bob Lawblog" <[email protected]>

What I want to do is use grep or whatever tool to return a list of email addresses and nothing more, in this format:

[email protected] <line break>
[email protected] <line break>
[email protected] <line break>

No leading or trailing underscores, no email bodies, no subjects, etc. (Getting it in a comma-separated list would be awesome too, but not necessary.)

Can someone help me with the regex/grep command for it? Thanks!

Upvotes: 1

Views: 330

Answers (1)

Steve
Steve

Reputation: 54502

I think I have understood your question. Correct me if I'm wrong. It seems you have two options to 'get' the email addresses:

  1. Use the file name, and apply regex.
  2. Use the From: line in each file to get the desired email addresses.

I like the second option the most, as finding regex to match an email address from: listedname_ [email protected]__subject_date.eml will be tricky, because what if the email address contains multiple underscores?

To get a list of email addresses from within each file, try this:

awk '/^From:/ { print substr($NF,2,length($NF)-2) }' *.txt > outfile

If you'd prefer a csv of these email addresses, use printf:

awk '/^From:/ { printf "%s,", substr($NF,2,length($NF)-2) } END { printf "\n" }' *.txt > outfile

Upvotes: 1

Related Questions