Reputation: 5958
Trying to remove a specific html tag from a file.
Question:
file: test1.txt
Hello World
</body>
</html>
sed
sed -e 's/<\/body>\\n<\/html>\\n//' test1.txt > test2.txt
Desired result in test2.txt
Hello World
Actual
Hello World
</body>
</html>
Upvotes: 1
Views: 414
Reputation: 133458
With your shown samples in awk
(if ok) you could try following. Using RS
and setting it to ^$
here. Also using match
function of awk
. So basically matching the string which is having new line in it and printing everything before and after it as per requirement.
awk -v RS="^$" '
match($0,/(^|\n)<\/body>\n<\/html>/){
print substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH)
}
' Input_file
Upvotes: 4
Reputation: 784998
Should I be using the sed command for desired results?
Actually grep
suits it better with:
grep -Ev '</(body|html)>' file
Hello World
If you want to remove specific <body>\n</html>\n
string only then use this sed
that would work with any version of sed
:
sed '/<\/body>/{N; /<\/html>/ {N; s~</body>\n</html>\n~~;};}' file
Hello World
Upvotes: 3
Reputation: 163217
Another variant using sed:
sed '/<\/body>/{N;/\n<\/html>/d}' test1.txt > test2.txt
Match </body>
and pull the next line into the pattern space using N
. Then match on a newline followed by </html>
.
If that matches, use d
to delete what is in the pattern space.
The content of file 'test2.txt'
Hello World
Upvotes: 3
Reputation: 626738
With GNU sed, you can use a -z
option to match newlines:
sed -z -i 's#</body>\n</html>##g' file
Note that #
is chosen as a regex delimiter char to avoid overescaping /
. Also, -i
makes changes directly into the input file.
See an online demo:
#!/bin/bash
s='Hello World
</body>
</html>'
sed -z 's#</body>\n</html>##g' <<< "$s"
Output:
Hello World
Upvotes: 3