Reputation: 786
I've been assigned some sed homework in my class and am one step away from finishing the assignment. I've racked my head trying to come up with a solution and nothing's worked to the point where I'm about to give up.
Basically, in the file I've got...I'm supposed to replace this:
<b>Some text here...each bold tag has different content...</b>
with
Some text here...each bold tag has different content...
I've got it partially completed, but what I can't figure out is how to "echo" the extracted content using sed (regexp).
I manage to substitute the content out just fine, but it's when I'm trying to actually OUTPUT the content that's between the HTML tags that it goes wrong.
If that's confusing, I truly apologize. I've been at this project a couple hours now and am getting a bit frusturated. Basically, why does this not work?
s/<b>.*<\/b>/.*/g
I simply want to output the content WITHOUT the bold tags.
Thanks a bunch!
Upvotes: 0
Views: 3756
Reputation: 9913
You need to use a capturing group, which are parentheses ()
So, it's just this:
s/<b>(.*)<\/b>/\1/g
Capturing groups are numbered, from left to right, starting with one, and increasing.
This syntax is the standard way to do regular expressions; sed's syntax is slightly different. the sed command is
sed 's/<b>\(.*\)<\/b>/\1/g' [file]
or
sed -r 's/<b>(.*)<\/b>/\1/g' [file]
Of course, if you just want to remove the bold tags, the other solution would be to just replace the HTML tags with blanks like so
sed 's/<\([^>]\|\(\"[^\"]\"\)\)*>//g' [file]
(I dislike sed's need to escape everything)
s/<([^\]|(\"[^\"]\"))*>//g
Upvotes: 1
Reputation: 208665
If you want to reference a part of your regex match in the replacement, you need to place that portion of the regex into a capturing group, and then refer to it using the group number preceded by a backslash. Try the following:
s/<b>\(.*\)</b>/\1/g
Upvotes: 1
Reputation: 27470
I think this question should be addressed to SED's mans. Like this: http://www.grymoire.com/Unix/Sed.html#uh-4
Upvotes: -1