What should I use in bash script to extract email addresses from noisy lines in file?

Question

I have a file that has one email address per line. Some of them are noisy, i.e. contain junk characters before and/or after the address, e.g.

name.lastname@bar.com&lt;mailto
&lt;someone@foo.bar.baz.edu&gt;
&amp;lt;someone@foo.com&amp;gt;Mobile
&lt;nobody@nowere.com&gt;
&lt;ab@cd.com
no@noise.com

How can I extract the right address from each line of the file in a loop like this?

for l in `cat file_of_email_addresses`
do
     # do magic here to extract address form $l
done

It looks like that if I get garbage before the address then it always ends with lt;, and if I get it after then it always starts with &

Cyrus · Accepted Answer

Try this with GNU grep:

grep -Po '[\w.-]+@[\w.-]+' file

Output:

name.lastname@bar.com
someone@foo.bar.baz.edu
someone@foo.com
nobody@nowere.com
ab@cd.com
no@noise.com

It's not perfect but perhaps it is sufficient for your task.

What should I use in bash script to extract email addresses from noisy lines in file?

Answers (2)

Related Questions