nullByteMe
nullByteMe

Reputation: 6391

Using sed to extract an email address

I'm trying to become familiar with sed by extracting email address from input in the following form:

something_from.someone:[email protected]

That is the input I'm sending to sed, I'm trying to remove everything up to and including ::

sed 'd/[[alphanum:]]+[.][[:alphanum:]]+[:]//'

Based on my research, this should do it, but I'm getting this error:

sed: 1: "d/[[:alphanum:]]+[.][[: ...": extra characters at the end of d command

Any ideas as to what I'm doing incorrectly?

Upvotes: 2

Views: 4057

Answers (3)

Andrew Meyer
Andrew Meyer

Reputation: 3

Here is my stab at it using my own example:

EMail="#E-mail: [email protected] #testing parsing"
echo $EMail |  sed  -E "s/.*[^a-zA-Z0-9._]([a-zA-Z0-9._]*@[a-zA-Z0-9._]*\.[a-zA-Z0-9._]*)[^a-zA-Z0-9._]?.*/\1/"

Here is an explanation of the parts:

sed -E : Extended regular expressions (can also use -r) -E is POSIX compliant

s/.*[^a-zA-Z0-9._] - Start of by excluding ^ any non-valid e-mail address components

([a-zA-Z0-9._]*@ - Match any valid e-mail characters directly preceeding an @ symbol

[a-zA-Z0-9._]*\.[a-zA-Z0-9.*]*) - Match a set of valid e-mail characters after the @ with at least one "." in the set. Denoted by the *\. in the middle. The entire matching pattern is enclosed in the ()

[^a-zA-Z0-9._]?.*/\1/" - Exclude, using the ^, any non-valid trailing e-mail address characters on the end if there are any and display the 1st matching pattern using the \1

The entire sed entry is bound by the "s/ ... /"

Upvotes: 0

xerostomus
xerostomus

Reputation: 557

This is far from being perfect, but works somehow:

echo "something_from.someone:[email protected]" |  sed  -rn "s/.*[ <=;,:]([^@ <>\"\{\}:+]*@[^@ \"]*\.[^@ <>\",;=)]*)[ >=;,:\)]?.*/\1/gp" # output: [email protected]

The logic:

[separators]([name]@[domain234].[domain1])[separators] ... print (email)

For instance I do not know how to get rid of these brackets [] in email. "\]" does not work.

I beg those smarter me to improve. Thanks!

Upvotes: 0

erewok
erewok

Reputation: 7835

Your delete syntax is incorrect. To delete in sed you need to do:

sed '(separator) [pattern to delete](separator)d'

Thus, for example:

sed -e '/regex/d' infile

This is for deleting whole lines generally. What you want to do instead is keep some part of the line so you need a capture-and-replace:

sed -e  's/regex-to-drop\(regex-to-keep\)/\1/g' input-file

The 's' is for substitute and the 'g' is for global, and the \( \) is what is captured while the \1 is where I want the captured thing to go. If I had a series of captured items,

\(something\)\(something_else\)

I could reproduce them with another character between them by simply putting the following in the substitute part of the sed command:

\1 ;; \2

This would produce: something ;; something_else and altogether would look like:

sed -e 's/\(something\)\(something_else\)/\1 ;; \2/g' input-file

In your case, it looks like you want to drop everything before the colon:

sed -e 's/^.*:\(.*\)$/\1/g' input-file

Footnote to the above as suggested by @fedorqui:

Sed uses standard regex notation to refer to the beginning and end of a line, so "^" refers to the beginning of the line and "$" refers to the end of the line. Thus, the complete explanation of the above is as follows:

's/^.*: 

Everything from the start of the line up to the colon (the "s" means we're setting up a 'substitute' command).

Then:

\(.*\)$/ 

CAPTURE everything up to the end of the line, and

/\1/g'

Substitute the WHOLE line with the captured item. Do it globally (for the whole file).

Upvotes: 5

Related Questions