sed'ing out HTML section

Question

I do have long HTML table output which consists of dozens of records. Example looks like this:

Now I want to extract section which contains user belonging to 90687, so I type:

sed my_html_file -e '/window.location.*90687/,/window.location/ !d'

Unfortunately it also fetches first line of next session which I would like to avoid. I did go trough 101 sed and awk tricks, but only solution I found is

sed my_html_file -e '/window.location.*90687/,+9 !d'

which would mean that I am interested in fetching 9 lines after pattern. The problem is that I cannot rely on "9" or any other number. Is there any way to solve it by sed ? BTW I am strongly interested in sed.

Naoric · Accepted Answer

If you are not sure if the closing might be inlined with the following record, you can try this

sed -n -E '/window\.location.*90687/,/<\/tr>/ {
/<\/tr>/! { p }
/<\/tr>/ { s/(.*)<\/tr>.*$/\1<\/tr>/ p } }
' input.txt

Though there are probably more elegant solutions, this will handle also things like this:

sed'ing out HTML section

Answers (2)

Related Questions

sed&#39;ing out HTML section

Answers (2)

Related Questions

sed'ing out HTML section