AlexG
AlexG

Reputation: 13

Regex for greping emails in file

I would like to validate emails from text files in a directory using bash.

My regex:

grep -Eoh \
         "\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,8}\b" * \
         | sort -u > mail_list

This regex satisfies all my requirements but it cannot exclude addresses such:

^%&[email protected]

and

[email protected]

(with 2 and more dots).

These kinds of addresses should be excluded.

How can I modify this regex to exclude these types of emails?
I can use only one expression for this task.

Upvotes: 0

Views: 144

Answers (2)

savanto
savanto

Reputation: 4550

Try this regex:

'\b[A-Za-z0-9]+[A-Za-z0-9._%+-]+@([A-Za-z0-9-]+\.)+[A-Za-z]{2,8}\b'

I added an alphanumeric group to the front, to force emails to begin with at least one letter or number, after which they may also have symbols.

After the @ sign, I added a group which can contain any number of letters or numbers, followed by one period. However, this group can be repeated multiple times, thus being able to match long.domain.name.com.

Finally, the regex ends with the final string as you had it, for example 'com'.


Update:

Since \b matches a word boundary, and the symbols ^%& are not considered part of the word 'blah', the above will still match [email protected] even though it is preceded by undesired characters. To avoid this, use a Negative Lookbehind. This will require using grep -P instead of -E:

grep -P '(?<![%&^])\b[A-Za-z0-9]+[A-Za-z0-9._%+-]+@([A-Za-z0-9-]+\.)+[A-Za-z]{2,8}\b'

The (?<![%&^]) tells regex to match further only if the string is not preceded by the characters %&^.

Upvotes: 0

Tom Fenech
Tom Fenech

Reputation: 74595

The email address ^%&[email protected] is actually a valid email address.

You can do this in Perl using the Email::Valid module (this assumes that each entry is on a new line):

perl -MEmail::Valid -ne 'print if Email::Valid->address($_)' file1 file2

file1

not email
[email protected]

file2

not email
[email protected]
^%&[email protected]
[email protected]

output

[email protected]
[email protected]
^%&[email protected]

Upvotes: 1

Related Questions