Reputation: 53
how do i get "sed" to delete everything else but email address.
db dump: someusername ::: kRW...0fPc ::: $2a$10$...aucvkDt86 ::: [email protected]
Upvotes: 5
Views: 14983
Reputation: 360315
This requires GNU sed
:
sed -r 's/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}/\n&\n/ig;s/(^|\n)[^@]*(\n|$)/\n/g;s/^\n|\n$//g;/^$/d' inputfile
Upvotes: 2
Reputation: 140417
The following will work no matter where the email address is in the line but only if there is one email address per line. If there are more than one it's going to only show the last one in the line. It also won't touch lines that don't have valid email addresses in them
sed 's/^.* \([^@ ]\+@[^ ]\+\) \?.*$/\1/'
$ cat dbdump
this line with no valid @ email address is untouched
::: a0$...aucvkDt86 ::: [email protected]
::: a0$... [email protected] db dump: someusername :::
$ sed 's/^.* \([^@ ]\+@[^ ]\+\) \?.*$/\1/' ./dbdump
this line with no valid @ email address is untouched
[email protected]
[email protected]
Upvotes: 1
Reputation: 361849
Does it have to be sed? What about grep? Here's how to use it with the regex you gave:
$ cat dbdump.txt
db dump: someusername ::: kRW...0fPc ::: $2a$10$...aucvkDt86 ::: [email protected]
another line with two e-mail addresses <[email protected]> on it -- [email protected]
$ grep -EiEio '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b' dbdump.txt
[email protected]
[email protected]
[email protected]
The -o
flag prints only the matching portions, i.e. just the e-mail addresses. -i
makes the matching case insensitive. It even finds multiple e-mail addresses on the same line.
Edit: I couldn't resist the -EiEio
. I suppose grep -Eio
or egrep -io
would also work...
Upvotes: 24
Reputation: 28618
With sed
:
$ echo "db dump: someusername ::: kRW...0fPc ::: $2a$10$...aucvkDt86 ::: [email protected]"|sed 's/.*::: //' [email protected]
With awk
:
$ echo "db dump: someusername ::: kRW...0fPc ::: $2a$10$...aucvkDt86 ::: [email protected]"|awk '{print $NF}'
EDIT: Given the new info in your comment - it's quite hard to do what you ask without any regularity. Check the Syntax section here:
Standard says that e.g. 1$%3{C}@example.com
is a valid email address (believe it or not). You can even quote it (the example given in the article is John [email protected]
). So, by following the standard it's almost impossible to recognize the valid e-mail.
If you restrict your search, you can e.g. extract the lines containing @
by first doing:
cat your-file.txt|grep @
then do some of the above. You can even do something like this:
$ echo "garbage [email protected] garbage"|sed 's/[^@]* \([a-zA-Z0-9.]*@[^ ]*\).*/\1/'
[email protected]
Note that the above works under the following assumptions:
local-part
(all before @
) contains only letters (lower- or upper-case), digits and a dotExpand the character set ([a-zA-Z0-9.]
) as you wish to make it less restrictive - e.g. you can do [a-zA-Z0-9.-_]
to include -
and _
.
Upvotes: 0