Hakim
Hakim

Reputation: 11700

Strange output from sed

I have some html files and want to extract only lines with containing these tags:

head
p

I used sed to extract these parts of the files, as follows:

grep "<head>" myfile.html | sed -e 's%\(head\)\(.*\)\(/head\)%title\2\/title%'

grep "<p>" myfile.html | sed -e 's%\(<p>\)\(.*\)\(</p\)\(>\)%\2\\%'

Everything is Ok, but I get "\" character at the end of each line. How I can overcome this problem?

Upvotes: 0

Views: 132

Answers (2)

lynxlynxlynx
lynxlynxlynx

Reputation: 1433

Don't use \ at the end of the replacement string:

grep "<p>" myfile.html | sed -e 's%\(<p>\)\(.*\)\(</p\)\(>\)%\2%'

Upvotes: 1

Dennis Williamson
Dennis Williamson

Reputation: 360105

In this command, you're telling it to add a backslash by including the double backslash:

sed -e 's%\(<p>\)\(.*\)\(</p\)\(>\)%\2\\%'

Try removing the backslashes:

sed -e 's%\(<p>\)\(.*\)\(</p\)\(>\)%\2%'

Also, you don't need grep:

sed -ne '/<p>/{s%\(<p>\)\(.*\)\(</p\)\(>\)%\2%;p}'

Upvotes: 2

Related Questions