Reputation: 1763
I have a directory of text files like so:
listedname_ [email protected]__subject_date.eml
The filenames are guaranteed to have this pattern during the "email" portion:
[email protected]_
So, email, at-sign, domain, period, tld, underscore. (They will not always have a leading underscore, due to not everyone setting their "name" in their email client.)
There are ~1,000 files in a directory on Windows, though I have Cygwin tools installed and can navigate to the directory. The file contents also have a line in it guaranteed to look like this:
From: "Bob Lawblog" <[email protected]>
What I want to do is use grep or whatever tool to return a list of email addresses and nothing more, in this format:
[email protected] <line break>
[email protected] <line break>
[email protected] <line break>
No leading or trailing underscores, no email bodies, no subjects, etc. (Getting it in a comma-separated list would be awesome too, but not necessary.)
Can someone help me with the regex/grep command for it? Thanks!
Upvotes: 1
Views: 330
Reputation: 54502
I think I have understood your question. Correct me if I'm wrong. It seems you have two options to 'get' the email addresses:
From:
line in each file to get the desired email addresses.I like the second option the most, as finding regex to match an email address from: listedname_ [email protected]__subject_date.eml
will be tricky, because what if the email address contains multiple underscores?
To get a list of email addresses from within each file, try this:
awk '/^From:/ { print substr($NF,2,length($NF)-2) }' *.txt > outfile
If you'd prefer a csv of these email addresses, use printf
:
awk '/^From:/ { printf "%s,", substr($NF,2,length($NF)-2) } END { printf "\n" }' *.txt > outfile
Upvotes: 1