technerdius
technerdius

Reputation: 353

extract all email addresses from all csv files in working directory using Linux

I am trying to grep all of the email addresses from all csv files in a working directory and print them to \n delimiter text file. I tried:

egrep -o '.*@.*' *.csv > alltheemails.txt

But, this seems to capture the entire line.

Then, I tried:

egrep -o ',.*@.*,' csv/*.csv > alltheemails.txt

I was attempting to only copy the email address and maybe the , delimiter, which can change later. This also copied the entire line.

Then, I tried:

egrep -o ',.*@.*,' csv/*.csv | sed -e 's/^,...@//g' | tee alltheemails.txt

This still captured everything in front of the email. I tried:

egrep -o ',.*@.*,' csv/*.csv | sed -e 's/*^,.*@//g' | tee alltheemails.txt

And many other variations, including:

sed -e 's/.*^[[a-zA-Z0-9]*\.\_\-\+\*@[[a-zA-Z0-9]-\.]*\.[a-zA-Z0-9]{3}$]/.*^[[a-zA-Z0-9]*\.\_\-\+\*@[[a-zA-Z0-9]-\.]*\.[a-zA-Z0-9]{3}$/g' csv/*.csv | egrep -eo | tee alltheemails.txt

This produced:

firstname,surname,lead,ip,address,city,state,postal,phone,date,range,daytime,interest,sex,dob,worktime,profit_estim,extra2

Please help me. Thank you!

Upvotes: 0

Views: 2811

Answers (4)

NeronLeVelu
NeronLeVelu

Reputation: 10039

sed -e '/@/!d' -e 's/.*/,&,/;s/[[:space:]]//g;s/,[^@,]*,/,/g;s/,\(.*\),/\1/' csv/*csv

will extract all email (if present) per line of csv file. result is email ofthe line separated by the ,

if 1 by line, add ;s/,/\n/g (for GNU sed, and a real new line instead of n for posix version)

Upvotes: 0

tripleee
tripleee

Reputation: 189638

With grep -o you need to provide a regex which matches only the text you actually want to extract.

grep -Eo '[^,"@]*@[^,"@]*' csv/*.csv

(The -E option isn't really useful here; but it's harmless. If you want to use some ERE features in your regex, then it will matter.)

Upvotes: 1

Starting from these csv:

~$ more *.csv 
::::::::::::::
email2.csv
::::::::::::::
[email protected],address,surname
test,[email protected],new york, central park
ternative,[email protected],paris
name,surname,nomail,address2
::::::::::::::
email.csv
::::::::::::::
[email protected],address,name,surname
name,surname,nomail,address2
test,[email protected],new york, central park
al,ternative,[email protected],paris

EDIT:A python solution (the code is wrapped with the -c option, see man python within bash about this):

python -c '
import sys

# needed to handle the bash argument, eg. the csv name
# skip first argument, it's the option "-c" itself
csvfile = str(sys.argv[1:][0])  
email_list = []

with open(csvfile) as f:
    for X in f:
        # field delimiter
        s = X.split(",")    
        for Z in s:
            # find the email address using "@"
            if "@" in Z:    
                email_list.append(Z)
for I in email_list:
    print I
' <(cat *.csv) > alltheemails.txt

You should use this python code from bash this way: python -c 'code between single quotes' <(cat *.csv) > alltheemails.txt. The bash command <(cat *.csv) combines the cat *.csv output with a redirection to create the python process input.

Of course you can remove the comments using the code. If you prefer, you can also put this code in a script to execute this way: python grep.py <(cat *.csv). Output:

[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]

Upvotes: 0

Chris Koknat
Chris Koknat

Reputation: 3451

Perl solution for all .csv files in the current directory
The email address can be in any field

perl -lne 'print $1 if /([^,@"]+@[^,@"]+)/' *.csv > alltheemails.txt

Prints the match $1
From the regular expression /([^,@"]+@[^,@"]+)/
[^,@"]+ = one or more occurrences of any character except ,@"

input:

name,surname"[email protected],address
name,surname,nomail,address2
nam,test,[email protected]"new york, central park
al,ternative,[email protected],paris
alternative,[email protected],paris

output:

[email protected]
[email protected]
[email protected]
[email protected]

If you prefer awk:

awk '{if (match($0, /[^,@"]+@[^,@"]+/, m)) print m[0]}' *.csv > alltheemails.txt

Upvotes: 1

Related Questions