Reputation: 1357
I've a file HTML like the one below:
<tr>
<td>SOMETHING1</td>
<td>SOMETHING2</td>
<td>SOMETHING3</td>
</tr>
<tr>
<td>SOMETHING1</td>
<td>SOMETHING2</td>
<td>SOMETHING3</td>
</tr>
<tr>
<td>SOMETHING1</td>
<td>SOMETHING2</td>
<td>SOMETHING3</td>
</tr>
</table>
<br>
</div>
<a href="javascript:;" onmousedown="toggleDiv('20161023');">Sunday 23 ... </a></h3>
<br>
<div class="time_div" id="20161023" style="display:none">
<p class="dep_parag">Test automation on Sunday 23 October</p>
<table id="table" border="1" cellpadding="3" cellspacing="0">
<tr>
<td>SOMETHING1</td>
<td>SOMETHING2</td>
<td>SOMETHING3</td>
</tr>
<tr>
<td>SOMETHING1</td>
<td>SOMETHING2</td>
<td>SOMETHING3</td>
</tr>
<tr>
<td>SOMETHING1</td>
<td>SOMETHING2</td>
<td>SOMETHING3</td>
</tr>
As you can see there is a list of table row divided by a section with some javascript (the section start with and finish with )
This is just an extraction of a html page containing more than 300.000 table row!
I've to delete the section with the javascript, beacuse i need just a long table row list, clean, without nothing between them.
Instead of doing it manually, that is crazy, i would like something (Regular expression) to clean the file with just one click (I use to run simple regular expression on NOTEPAD++, but this one is a little bit hard for me)
I was thinking at something like:
delete all the row from to cellspacing="0">
Or
delete all the row from and following 8 lines.
Can someone be so gentle to help me with this regex?
Thanks a lot! :)
Upvotes: 0
Views: 131
Reputation: 161
Assuming that you are not fussed about irregular whitespace, how about a search pattern of:
</table>.*?<table.*?>
With an empty "Replace with" string, tick the "Regular expression" and ". matches newline" options.
Upvotes: 2
Reputation: 516
This regular expression will work with flag s single-line for php,python, for java initiate expression with DOTALL option
\<\/table\>.+?(?=javascript\:\;).+?(?=\<table.+?cellspacing\=\"0\"\>)\<table.+?cellspacing\=\"0\"\>
Upvotes: 1
Reputation: 1291
I don't quite understand which part do you want to remove (my understanding is from </table>
to cellspacing="0">
? ), but it should be fairly easy. Do you mean something like this ?
<a href="javascript([^\n]+\r\n){5}
Upvotes: 1