Reputation: 1189
I am trying to get matched non-numerical strings on new line with sed
So, if I have string abc def 123 (ghi)
, I want output to be:
(abc)
(def)
(ghi)
This is what I have tried:
echo "abc def 123 (ghi)" | sed -r 's/([a-z]+)/(\1)\n/g'
But this outputs following:
(abc)
(def)
123 ((ghi)
)
I am quite confused here. Have many doubts: Why there is leading space on line 2 and 3? Why double bracket ghi
? Why 123
is not eliminated? Why, enclosing bracker came individually on last line?
Update
Actually, I wanted to extract URLs from specific domain. So using suggestions in comments and answer, I tried below:
in="https://www.example.com/user1 ddsf none http://www.example.com/user2 kbu7f7yy"
echo $in | sed 's/http[s]*:\/\/www.example.com\/[^ ]*/&\n/g'
This printed following:
https://www.example.com/user1
ddsf none http://www.example.com/user2
kbu7f7yy
So, I tried this (as suggested in one )
echo $in | sed 's/.*\(http[s]*:\/\/www.example.com\/[^ ]*\).*/\1\n/g'
But I ended up getting:
http://www.example.com/user2
Upvotes: 0
Views: 69
Reputation: 5665
The sed can be simple: sed 's/[()0-9]//g; s/[a-z]\+/(&)\n/g; s/ //g;'
(&)\n
, where &
is sed shorthand for the matched wordThis could also be done this way: grep -Pow '[a-z]+' | sed 's/.*/(&)/'
For the url example, grep
is a lot easier for extracting words than sed: grep -Pow 'http\S+'
-P
for perl matching to allow \S+
to mean 'non-space'-o
for only matching-w
for word matching (equivalent to \bhttp\S+\b
)If, for some reason you still want to add parens, grep -Pow 'http\S+' | sed s/.*/(&)/
Upvotes: 0
Reputation: 58578
This might work for you (GNU sed):
sed -E '/\n/!s/\<[[:alpha:]]+\>/\n(&)\n/g;/^\([[:alpha:]]+\)/P;D' file
This surrounds alpha strings by newlines within parens and then only prints those lines that begin with an open paren, alpha characters and a closing paren.
For urls, maybe:
sed -E '/\n/!s/https?\S+/\n&\n/g;/^https?/P;D' file
Use the -E
command line option so as to use extended regexps:
/\n/!s/https?\S+/\n&\n/g
if the current line does not contain any newlines, globally substitute strings that begin http
with and an optional s
for that same string surrounded by newlines./^https?/P
if front of the current pattern space begins with a http
with an optional s
, print up to and including the next new line.D
delete up to and including the next new line and restart the sed cycle (without fetching the next line from the file) if the pattern space is not empty.Thus the first time through the substitution will take place and there after the printing/deleting will occur. The pattern space will be reduced each time it is processed until it is empty and then the next line will be presented to the pattern space.
Upvotes: 1
Reputation: 242443
Replace anything between the beginning of a line, letters, and the end of a line by ) (
, then remove the surplus parentheses:
sed -r 's/[^a-z]+|^|$/) (/g;s/^\) | \($//g'
But I find the following Perl solution more readable:
perl -lne 'print "($1)" while /([a-z]+)/g'
-n
reads the input line by line and runs the code for each line-l
removes newlines from input and adds them to outputUpvotes: 2