Reputation: 6391
I'm trying to become familiar with sed
by extracting email address from input in the following form:
something_from.someone:[email protected]
That is the input I'm sending to sed
, I'm trying to remove everything up to and including :
:
sed 'd/[[alphanum:]]+[.][[:alphanum:]]+[:]//'
Based on my research, this should do it, but I'm getting this error:
sed: 1: "d/[[:alphanum:]]+[.][[: ...": extra characters at the end of d command
Any ideas as to what I'm doing incorrectly?
Upvotes: 2
Views: 4057
Reputation: 3
Here is my stab at it using my own example:
EMail="#E-mail: [email protected] #testing parsing"
echo $EMail | sed -E "s/.*[^a-zA-Z0-9._]([a-zA-Z0-9._]*@[a-zA-Z0-9._]*\.[a-zA-Z0-9._]*)[^a-zA-Z0-9._]?.*/\1/"
Here is an explanation of the parts:
sed -E
: Extended regular expressions (can also use -r) -E is POSIX compliant
s/.*[^a-zA-Z0-9._]
- Start of by excluding ^
any non-valid e-mail address components
([a-zA-Z0-9._]*@
- Match any valid e-mail characters directly preceeding an @
symbol
[a-zA-Z0-9._]*\.[a-zA-Z0-9.*]*)
- Match a set of valid e-mail characters after the @
with at least one "." in the set. Denoted by the *\.
in the middle. The entire matching pattern is enclosed in the ()
[^a-zA-Z0-9._]?.*/\1/"
- Exclude, using the ^
, any non-valid trailing e-mail address characters on the end if there are any and display the 1st matching pattern using the \1
The entire sed entry is bound by the "s/ ... /"
Upvotes: 0
Reputation: 557
This is far from being perfect, but works somehow:
echo "something_from.someone:[email protected]" | sed -rn "s/.*[ <=;,:]([^@ <>\"\{\}:+]*@[^@ \"]*\.[^@ <>\",;=)]*)[ >=;,:\)]?.*/\1/gp" # output: [email protected]
The logic:
[separators]([name]@[domain234].[domain1])[separators] ... print (email)
For instance I do not know how to get rid of these brackets [] in email. "\]" does not work.
I beg those smarter me to improve. Thanks!
Upvotes: 0
Reputation: 7835
Your delete syntax is incorrect. To delete in sed you need to do:
sed '(separator) [pattern to delete](separator)d'
Thus, for example:
sed -e '/regex/d' infile
This is for deleting whole lines generally. What you want to do instead is keep some part of the line so you need a capture-and-replace:
sed -e 's/regex-to-drop\(regex-to-keep\)/\1/g' input-file
The 's' is for substitute and the 'g' is for global, and the \( \)
is what is captured while the \1
is where I want the captured thing to go. If I had a series of captured items,
\(something\)\(something_else\)
I could reproduce them with another character between them by simply putting the following in the substitute part of the sed command:
\1 ;; \2
This would produce: something ;; something_else
and altogether would look like:
sed -e 's/\(something\)\(something_else\)/\1 ;; \2/g' input-file
In your case, it looks like you want to drop everything before the colon:
sed -e 's/^.*:\(.*\)$/\1/g' input-file
Footnote to the above as suggested by @fedorqui:
Sed uses standard regex notation to refer to the beginning and end of a line, so "^" refers to the beginning of the line and "$" refers to the end of the line. Thus, the complete explanation of the above is as follows:
's/^.*:
Everything from the start of the line up to the colon (the "s" means we're setting up a 'substitute' command).
Then:
\(.*\)$/
CAPTURE everything up to the end of the line, and
/\1/g'
Substitute the WHOLE line with the captured item. Do it globally (for the whole file).
Upvotes: 5