Reputation: 5927
I have a file that has one email address per line. Some of them are noisy, i.e. contain junk characters before and/or after the address, e.g.
[email protected]<mailto
<[email protected]>
<[email protected]>Mobile
<[email protected]>
<[email protected]
[email protected]
How can I extract the right address from each line of the file in a loop like this?
for l in `cat file_of_email_addresses`
do
# do magic here to extract address form $l
done
It looks like that if I get garbage before the address then it always ends with lt;
, and if I get it after then it always starts with &
Upvotes: 1
Views: 98
Reputation: 88583
Try this with GNU grep:
grep -Po '[\w.-]+@[\w.-]+' file
Output:
[email protected] [email protected] [email protected] [email protected] [email protected] [email protected]
It's not perfect but perhaps it is sufficient for your task.
Upvotes: 1
Reputation: 180113
It would be better to use a tool that's built for pattern matching, such as sed
. It would help to first decode the data, as Etan suggested, but if you're willing to assume
;
,&
,@
, and that in the address,then you can do this:
sed 's/^\([^@]*;\)\?\([^&;]*@[^&;]*\).*/\2/' file_of_email_addresses
Upvotes: 0