Reputation: 91
Using grep or egrep
How many email addresses are in ‘first.last’ name format AND involve someone whose first name starts with a letter in the first half of the alphabet? (I want to get the count)
excerpt of testingfile.txt
[email protected]
hhhhhh
ItzStatic
jackass
The_Epic_Turtle
[email protected]
overthemoon34
smiley362
emilio
[email protected]
ddc44ever
[email protected]
[email protected]
middle614
IntegrityJeff
5432
[email protected]
[email protected]
To do this I wanted to filter each line to check if it had a "@" and check the first letter to see if it was A-M or a-m, and check if it had a period.
grep -c "@" testingfile.txt
grep -c "\." testingfile.txt
(although this only checks if there is 1 occurrence in the line.)
grep -c "[a-mA-M]" testingfile.txt
(still haven't gotten this one to work)
How would I combine the 3 statements together, and how would I check to see if the first character of each line is a letter between a-m or A-M?
Upvotes: 2
Views: 424
Reputation: 113964
Because you were interested in the problem of more than one email on a line, let's consider this test file:
$ cat testingfile.txt
[email protected]
hhhhhh
[email protected] overthemoon34 [email protected]
[email protected] [email protected]
IntegrityJeff
5432
[email protected]
[email protected]
This shows the first parts of all the matching email addresses whose first letters are in the first half of the alphabet:
$ grep -o "\b[a-mA-M][^[:blank:]]*@" testingfile.txt
glorious@
david.webb@
Check.it@
dickens@
jdm-mojo@
gertrude@
This counts them:
$ grep -o "\b[a-mA-M][^[:blank:]]*@" testingfile.txt | wc -l
6
If we want to restrict the match to email addresses whose name part includes a period:
$ grep -o "\b[a-mA-M][^[:blank:]]*\.[^[:blank:]]*@" testingfile.txt
david.webb@
Check.it@
And to count them:
$ grep -o "\b[a-mA-M][^[:blank:]]*\.[^[:blank:]]*@" testingfile.txt | wc -l
2
The regex used here, \b[a-mA-M][^[:blank:]]*@
is quite simple. Regexes exist that accurately select true email addresses but they are quite complex.
grep -c
counts lines. We first have to use grep -o
to put each match on a separate line and then use wc -l
to count the lines.
The regex [a-mA-M]
is not unicode-safe.
Upvotes: 1
Reputation: 50263
You could use:
grep "^[a-mA-M].*\..*@" testingfile.txt
This will search for a string where it starts with the first half of the alphabet (^[a-mA-M].*
) Then, at some point contains a period (.*\.
) and then at some point later has a @
: (.*@
).
Upvotes: 0