NVK
NVK

Reputation: 53

How do i get "sed" to delete everything else but email address.

how do i get "sed" to delete everything else but email address.

db dump: someusername ::: kRW...0fPc ::: $2a$10$...aucvkDt86 ::: [email protected]

Upvotes: 5

Views: 14983

Answers (4)

Dennis Williamson
Dennis Williamson

Reputation: 360315

This requires GNU sed:

sed -r 's/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}/\n&\n/ig;s/(^|\n)[^@]*(\n|$)/\n/g;s/^\n|\n$//g;/^$/d' inputfile
  • split input lines so email addresses and other strings are separated by newlines
  • erase sequences that consist of only non-@ characters delimited by newlines or the beginning or end of the input line
  • erase extra newlines and blank lines

Upvotes: 2

SiegeX
SiegeX

Reputation: 140417

The following will work no matter where the email address is in the line but only if there is one email address per line. If there are more than one it's going to only show the last one in the line. It also won't touch lines that don't have valid email addresses in them

sed 's/^.* \([^@ ]\+@[^ ]\+\) \?.*$/\1/'

Input

$ cat dbdump
this line with no valid @ email address is untouched
::: a0$...aucvkDt86 ::: [email protected]
::: a0$... [email protected] db dump: someusername :::

Output

$  sed 's/^.* \([^@ ]\+@[^ ]\+\) \?.*$/\1/' ./dbdump
this line with no valid @ email address is untouched
[email protected]
[email protected]

Upvotes: 1

John Kugelman
John Kugelman

Reputation: 361849

Does it have to be sed? What about grep? Here's how to use it with the regex you gave:

$ cat dbdump.txt 
db dump: someusername ::: kRW...0fPc ::: $2a$10$...aucvkDt86 ::: [email protected]
another line with two e-mail addresses <[email protected]> on it -- [email protected]

$ grep -EiEio '\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b' dbdump.txt
[email protected]
[email protected]
[email protected]

The -o flag prints only the matching portions, i.e. just the e-mail addresses. -i makes the matching case insensitive. It even finds multiple e-mail addresses on the same line.

Edit: I couldn't resist the -EiEio. I suppose grep -Eio or egrep -io would also work...

Upvotes: 24

icyrock.com
icyrock.com

Reputation: 28618

With sed:

$ echo "db dump: someusername ::: kRW...0fPc ::: $2a$10$...aucvkDt86 ::: [email protected]"|sed 's/.*::: //' [email protected] 

With awk:

$ echo "db dump: someusername ::: kRW...0fPc ::: $2a$10$...aucvkDt86 ::: [email protected]"|awk '{print $NF}'

EDIT: Given the new info in your comment - it's quite hard to do what you ask without any regularity. Check the Syntax section here:

Standard says that e.g. 1$%3{C}@example.com is a valid email address (believe it or not). You can even quote it (the example given in the article is John [email protected]). So, by following the standard it's almost impossible to recognize the valid e-mail.

If you restrict your search, you can e.g. extract the lines containing @ by first doing:

cat your-file.txt|grep @

then do some of the above. You can even do something like this:

$ echo "garbage [email protected] garbage"|sed 's/[^@]* \([a-zA-Z0-9.]*@[^ ]*\).*/\1/'
[email protected]

Note that the above works under the following assumptions:

  • There's a space before an email address
  • There are no spaces in an email address itself
  • There is one email address in the line (it will actually get only the first one, so it can work with more then one)
  • The local-part (all before @) contains only letters (lower- or upper-case), digits and a dot

Expand the character set ([a-zA-Z0-9.]) as you wish to make it less restrictive - e.g. you can do [a-zA-Z0-9.-_] to include - and _.

Upvotes: 0

Related Questions