Reputation: 6119
I have multiple files in a folder and each of them have one email message. Each message has a header in the format
Subject: formatting fonts
To: [email protected]
From: sender name
message body
I want to get all the unique sender names from all the messages (there is only 1 message per file) . How can I do that?
Upvotes: 2
Views: 4499
Reputation: 2716
To tighten up some of the answers. (I don't have enough reputation yet to comment.) The following should be sufficient:
grep -m 1 '^From: ' * | sed -'s/^From: *//' | sort -u
Will give you a list of unique from addresses for all the messages in the directory. If you want to clean up the address portion you can add more to the sed command like che's answer. There is no need to need to 'cat * | grep'.
Upvotes: 0
Reputation: 15296
Assuming there can't be random headers in the middle of the messages, then this should do the trick:
cat * | grep '^From: ' | sort -u
If there may be other misleading "From:" lines in the middle of the messages, then you just need to make sure you are only getting the first matching line from each message, like so:
for f in * ; do cat $f | grep '^From: ' | head -1 | sort -u ; done
Obviously you can replace the * in either command with a different glob or list of file names.
Upvotes: 2
Reputation: 12273
Do you want to filter out sender names or e-mail addresses? Usually you have both in "From" lines, such as
From: Lessie <[email protected]>
The you can use sed
to remove the e-mail address part
sed 's/^From: //;s/ *<[^>]*> *//'
ending up with something like this:
ls | while read filename
do
grep '^From: ' $filename | head -n1 | sed 's/^From: //;s/ *<[^>]*> *//;s/^"//;s/"$//'
done | sort -u
Upvotes: 1