Reputation: 343
I have a file that alternates HTML-style comments with its real text:
<!-- Here's a first line -->
Here's a first line
<!-- Here's a second line -->
Here's a third line
If a comment is identical to the following line apart from the tags themselves, I want to delete it, but otherwise leave it:
Here's a first line
<!-- Here's a second line -->
Here's a third line
I've read the similar questions here, but been unable to extrapolate the solutions to my situation.
Upvotes: 0
Views: 71
Reputation: 58351
This might work for you (GNU sed):
sed -r '$!N;/<!-- (.*) -->\n\1$/!P;D' file
This compares all consecutive lines throughout the file for the requested condition and if found does not print the first line of the pair.
N.B. This caters for consecutive comment lines
Upvotes: 1
Reputation: 95614
sed '/^<!-- \(.*\) -->$/N;s/^<!-- \(.*\) -->\n\1$/\1/'
#
# /^<!-- \(.*\) -->$/ match an HTML comment as its own line, in which case
# N; add the next line to the pattern space and keep going
#
# s/^<!-- \(.*\) -->\n\1$/ detect a comment as you
# \1/ described and replace it
# appropriately
As shown:
$ sed '/^<!-- \(.*\) -->$/N;s/^<!-- \(.*\) -->\n\1$/\1/' <<EOF
> <!-- Foo -->
> Foo
> <!-- Bar -->
> Baz
> <!-- Quux -->
> Quux
>
> Something
> Something
> Another something
> EOF
Gives:
Foo
<!-- Bar -->
Baz
Quux
Something
Something
Another something
You may need to tweak this to handle indentation, but that shouldn't be too surprising. You may also want to switch to sed -r
, which will require the that the parentheses are NOT escaped.
Upvotes: 1
Reputation: 784908
You can use this awk
:
awk '/<!--.*?-->/{h=$0; gsub(/ *(<!--|-->) */, ""); s=$0; next}
$0!=s{$0=h ORS $0} 1' file.html
Here's a first line
<!-- Here's a second line -->
Here's a third line
Upvotes: 1