Reputation: 353
I am trying to grep all of the email addresses from all csv files in a working directory and print them to \n delimiter text file. I tried:
egrep -o '.*@.*' *.csv > alltheemails.txt
But, this seems to capture the entire line.
Then, I tried:
egrep -o ',.*@.*,' csv/*.csv > alltheemails.txt
I was attempting to only copy the email address and maybe the , delimiter, which can change later. This also copied the entire line.
Then, I tried:
egrep -o ',.*@.*,' csv/*.csv | sed -e 's/^,...@//g' | tee alltheemails.txt
This still captured everything in front of the email. I tried:
egrep -o ',.*@.*,' csv/*.csv | sed -e 's/*^,.*@//g' | tee alltheemails.txt
And many other variations, including:
sed -e 's/.*^[[a-zA-Z0-9]*\.\_\-\+\*@[[a-zA-Z0-9]-\.]*\.[a-zA-Z0-9]{3}$]/.*^[[a-zA-Z0-9]*\.\_\-\+\*@[[a-zA-Z0-9]-\.]*\.[a-zA-Z0-9]{3}$/g' csv/*.csv | egrep -eo | tee alltheemails.txt
This produced:
firstname,surname,lead,ip,address,city,state,postal,phone,date,range,daytime,interest,sex,dob,worktime,profit_estim,extra2
Please help me. Thank you!
Upvotes: 0
Views: 2811
Reputation: 10039
sed -e '/@/!d' -e 's/.*/,&,/;s/[[:space:]]//g;s/,[^@,]*,/,/g;s/,\(.*\),/\1/' csv/*csv
will extract all email (if present) per line of csv file. result is email ofthe line separated by the ,
if 1 by line, add ;s/,/\n/g
(for GNU sed, and a real new line instead of n
for posix version)
Upvotes: 0
Reputation: 189638
With grep -o
you need to provide a regex which matches only the text you actually want to extract.
grep -Eo '[^,"@]*@[^,"@]*' csv/*.csv
(The -E
option isn't really useful here; but it's harmless. If you want to use some ERE features in your regex, then it will matter.)
Upvotes: 1
Reputation: 974
Starting from these csv:
~$ more *.csv
::::::::::::::
email2.csv
::::::::::::::
[email protected],address,surname
test,[email protected],new york, central park
ternative,[email protected],paris
name,surname,nomail,address2
::::::::::::::
email.csv
::::::::::::::
[email protected],address,name,surname
name,surname,nomail,address2
test,[email protected],new york, central park
al,ternative,[email protected],paris
EDIT:A python solution (the code is wrapped with the -c
option, see man python
within bash about this):
python -c '
import sys
# needed to handle the bash argument, eg. the csv name
# skip first argument, it's the option "-c" itself
csvfile = str(sys.argv[1:][0])
email_list = []
with open(csvfile) as f:
for X in f:
# field delimiter
s = X.split(",")
for Z in s:
# find the email address using "@"
if "@" in Z:
email_list.append(Z)
for I in email_list:
print I
' <(cat *.csv) > alltheemails.txt
You should use this python code from bash this way: python -c 'code between single quotes' <(cat *.csv) > alltheemails.txt
. The bash command <(cat *.csv)
combines the cat *.csv
output with a redirection to create the python process input.
Of course you can remove the comments using the code. If you prefer, you can also put this code in a script to execute this way: python grep.py <(cat *.csv)
.
Output:
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
[email protected]
Upvotes: 0
Reputation: 3451
Perl solution for all .csv files in the current directory
The email address can be in any field
perl -lne 'print $1 if /([^,@"]+@[^,@"]+)/' *.csv > alltheemails.txt
Prints the match $1
From the regular expression /([^,@"]+@[^,@"]+)/
[^,@"]+
= one or more occurrences of any character except ,@"
input:
name,surname"[email protected],address
name,surname,nomail,address2
nam,test,[email protected]"new york, central park
al,ternative,[email protected],paris
alternative,[email protected],paris
output:
[email protected]
[email protected]
[email protected]
[email protected]
If you prefer awk:
awk '{if (match($0, /[^,@"]+@[^,@"]+/, m)) print m[0]}' *.csv > alltheemails.txt
Upvotes: 1