Arringar1
Arringar1

Reputation: 415

Extract email addresses from text file using regex with bash or command line

How can I grep out only the email address using a regex from a file with multiple lines similar to this. (a sql dump to be precise)

Unfortunately I cannot just go back and dump the email column at this point.

Example data:

62372,35896,1,cgreen,Chad,Green,[email protected],123456789,0,,,,,,,,,3,Blah,,2013-05-02 17:42:31.659574,164842,,0,0

I have tried this but it did not work:

grep -o '[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}' file.csv

Upvotes: 18

Views: 33294

Answers (4)

Digital Trauma
Digital Trauma

Reputation: 15996

If you still want to go the grep -o route, this one works for me:

$ grep -i -o '[A-Z0-9._%+-]\+@[A-Z0-9.-]\+\.[A-Z]\{2,4\}' file.csv
[email protected]
$ 

I appear to have 2 versions of grep in my path, 2.4.2 and 2.5.1. Only 2.5.1 appears to support the -o option.

Your regular expression is close, but you're missing 2 things:

  • regular expressions are case sensitive. So you can either pass -i to grep or add extra a-z to your square bracket expressions
  • The + modifiers and {} curly braces appear to need to be escaped.

Upvotes: 50

Birei
Birei

Reputation: 36262

You can solve it using with the help of the built-in csv module and the external validators module, like this:

import validators
import csv
import sys

with open(sys.argv[1], newline='') as csvfile:
    csvreader = csv.reader(csvfile)
    for row in csvreader:
        for field in row:
            if validators.email(field):
                print(field)

Run it like:

python3 script.py infile

That yields:

[email protected]

Upvotes: 1

Jonathan Hall
Jonathan Hall

Reputation: 79594

The best way to handle this is with a proper CSV parser. A simple way to accomplish that, if it's a one-time task, is to load the CSV file into your favorite spreadsheet software, then extract just the email field.

It is difficult to parse CSV with a regex, because of the possibility of escaped commas, quoted text, etc.

Consider, the following are valid email addresses, according to Internet standards:

If you know for a fact that you will never have this sort of data, then perhaps simple grep and awk tools will work (as in @anubhava's answer).

Upvotes: 1

anubhava
anubhava

Reputation: 785156

If you know the field position then it is much easier with awk or cut:

awk -F ',' '{print $7}' file

OR

cut -d ',' -f7 file

Upvotes: 3

Related Questions