Reputation: 11700
I have some html files and want to extract only lines with containing these tags:
head
p
I used sed to extract these parts of the files, as follows:
grep "<head>" myfile.html | sed -e 's%\(head\)\(.*\)\(/head\)%title\2\/title%'
grep "<p>" myfile.html | sed -e 's%\(<p>\)\(.*\)\(</p\)\(>\)%\2\\%'
Everything is Ok, but I get "\" character at the end of each line. How I can overcome this problem?
Upvotes: 0
Views: 132
Reputation: 1433
Don't use \ at the end of the replacement string:
grep "<p>" myfile.html | sed -e 's%\(<p>\)\(.*\)\(</p\)\(>\)%\2%'
Upvotes: 1
Reputation: 360105
In this command, you're telling it to add a backslash by including the double backslash:
sed -e 's%\(<p>\)\(.*\)\(</p\)\(>\)%\2\\%'
Try removing the backslashes:
sed -e 's%\(<p>\)\(.*\)\(</p\)\(>\)%\2%'
Also, you don't need grep
:
sed -ne '/<p>/{s%\(<p>\)\(.*\)\(</p\)\(>\)%\2%;p}'
Upvotes: 2