ellsusan
ellsusan

Reputation: 91

Using grep cmd to filter

Using grep or egrep

How many email addresses are in ‘first.last’ name format AND involve someone whose first name starts with a letter in the first half of the alphabet? (I want to get the count)

excerpt of testingfile.txt

[email protected]
hhhhhh
ItzStatic
jackass
The_Epic_Turtle
[email protected]
overthemoon34
smiley362
emilio
[email protected]
ddc44ever
[email protected]
[email protected]
middle614
IntegrityJeff
5432
[email protected]
[email protected]

To do this I wanted to filter each line to check if it had a "@" and check the first letter to see if it was A-M or a-m, and check if it had a period.

grep -c "@" testingfile.txt

grep -c "\." testingfile.txt (although this only checks if there is 1 occurrence in the line.)

grep -c "[a-mA-M]" testingfile.txt (still haven't gotten this one to work)

How would I combine the 3 statements together, and how would I check to see if the first character of each line is a letter between a-m or A-M?

Upvotes: 2

Views: 424

Answers (2)

John1024
John1024

Reputation: 113964

Finding email addresses that start with [a-mA-Z]

Because you were interested in the problem of more than one email on a line, let's consider this test file:

$ cat testingfile.txt 
[email protected]
hhhhhh
[email protected] overthemoon34 [email protected]
[email protected] [email protected]
IntegrityJeff
5432
[email protected]
[email protected]

This shows the first parts of all the matching email addresses whose first letters are in the first half of the alphabet:

$ grep -o "\b[a-mA-M][^[:blank:]]*@" testingfile.txt 
glorious@
david.webb@
Check.it@
dickens@
jdm-mojo@
gertrude@

This counts them:

$ grep -o "\b[a-mA-M][^[:blank:]]*@" testingfile.txt | wc -l
6

Being more strict about the "first" name

If we want to restrict the match to email addresses whose name part includes a period:

$ grep -o "\b[a-mA-M][^[:blank:]]*\.[^[:blank:]]*@" testingfile.txt 
david.webb@
Check.it@

And to count them:

$ grep -o "\b[a-mA-M][^[:blank:]]*\.[^[:blank:]]*@" testingfile.txt | wc -l
2

Notes

  1. The regex used here, \b[a-mA-M][^[:blank:]]*@ is quite simple. Regexes exist that accurately select true email addresses but they are quite complex.

  2. grep -c counts lines. We first have to use grep -o to put each match on a separate line and then use wc -l to count the lines.

  3. The regex [a-mA-M] is not unicode-safe.

Upvotes: 1

JNevill
JNevill

Reputation: 50263

You could use:

 grep "^[a-mA-M].*\..*@" testingfile.txt

This will search for a string where it starts with the first half of the alphabet (^[a-mA-M].*) Then, at some point contains a period (.*\.) and then at some point later has a @: (.*@).

Upvotes: 0

Related Questions