Reputation: 415
How can I grep out only the email address using a regex from a file with multiple lines similar to this. (a sql dump to be precise)
Unfortunately I cannot just go back and dump the email column at this point.
Example data:
62372,35896,1,cgreen,Chad,Green,[email protected],123456789,0,,,,,,,,,3,Blah,,2013-05-02 17:42:31.659574,164842,,0,0
I have tried this but it did not work:
grep -o '[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}' file.csv
Upvotes: 18
Views: 33294
Reputation: 15996
If you still want to go the grep -o
route, this one works for me:
$ grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' file.csv
[email protected]
$
I appear to have 2 versions of grep in my path, 2.4.2 and 2.5.1. Only 2.5.1 appears to support the -o option.
Your regular expression is close, but you're missing 2 things:
-i
to grep or add extra a-z
to your square bracket expressions+
modifiers and {}
curly braces appear to need to be escaped.Upvotes: 50
Reputation: 36262
You can solve it using python with the help of the built-in csv
module and the external validators
module, like this:
import validators
import csv
import sys
with open(sys.argv[1], newline='') as csvfile:
csvreader = csv.reader(csvfile)
for row in csvreader:
for field in row:
if validators.email(field):
print(field)
Run it like:
python3 script.py infile
That yields:
[email protected]
Upvotes: 1
Reputation: 79594
The best way to handle this is with a proper CSV parser. A simple way to accomplish that, if it's a one-time task, is to load the CSV file into your favorite spreadsheet software, then extract just the email field.
It is difficult to parse CSV with a regex, because of the possibility of escaped commas, quoted text, etc.
Consider, the following are valid email addresses, according to Internet standards:
If you know for a fact that you will never have this sort of data, then perhaps simple grep and awk tools will work (as in @anubhava's answer).
Upvotes: 1
Reputation: 785156
If you know the field position then it is much easier with awk or cut:
awk -F ',' '{print $7}' file
OR
cut -d ',' -f7 file
Upvotes: 3