Reputation: 111
I do have long HTML table output which consists of dozens of records. Example looks like this:
<tr onclick="window.location='/team/90687/';" style="cursor: pointer;" class="">
<td class="number">163124</td>
<td class="img">3</td>
<td class="user">
<span class="name">Mosse John</span>
</td>
<td class="number">3332</td>
<td class="number">497</td>
<td class="number">20</td>
</tr>
<tr onclick="window.location='/team/342465/';" style="cursor: pointer;" class="">
<td class="number">163124</td>
<td class="img">2</td>
<td class="user">
<span class="name">Sus Peter</span>
</td>
<td class="number">3332</td>
<td class="number">450</td>
<td class="number">20</td>
</tr>
Now I want to extract section which contains user belonging to 90687, so I type:
sed my_html_file -e '/window.location.*90687/,/window.location/ !d'
Unfortunately it also fetches first line of next session which I would like to avoid. I did go trough 101 sed and awk tricks, but only solution I found is
sed my_html_file -e '/window.location.*90687/,+9 !d'
which would mean that I am interested in fetching 9 lines after pattern. The problem is that I cannot rely on "9" or any other number. Is there any way to solve it by sed ? BTW I am strongly interested in sed.
Upvotes: 1
Views: 56
Reputation: 355
If you are not sure if the closing </tr>
might be inlined with the following record, you can try this
sed -n -E '/window\.location.*90687/,/<\/tr>/ {
/<\/tr>/! { p }
/<\/tr>/ { s/(.*)<\/tr>.*$/\1<\/tr>/ p } }
' input.txt
Though there are probably more elegant solutions, this will handle also things like this:
<tr onclick="window.location='/team/90687/';" style="cursor: pointer;" class="">
<td class="number">163124</td>
<td class="img">3</td>
<td class="user">
<span class="name">Mosse John</span>
</td>
<td class="number">3332</td>
<td class="number">497</td>
<!-- Confusing Row -->
<td class="number">20</td></tr> <tr onclick="window.location='/team/342465/';" style="cursor: pointer;" class="">
<td class="number">163124</td>
<td class="img">2</td>
<td class="user">
<span class="name">Sus Peter</span>
</td>
<td class="number">3332</td>
<td class="number">450</td>
<td class="number">20</td>
</tr>
Upvotes: 1
Reputation: 8314
Simple solution for your data:
sed my_html_file -e '/window.location.*90687/,/<\/tr>/ !d'
This will print all the lines until the closing tag </tr>
is met.
More complex solution:
sed my_html_file -n -e '/window.location.*90687/,/window.location/ { H;x; /window.location.*window.location/ !{ x;p }} '
This will print all the lines until second window.location
is met.
Upvotes: 1