ivoruJavaBoy
ivoruJavaBoy

Reputation: 1357

Delete the line beginning with given string and the following n lines

I've a file HTML like the one below:

      <tr>
        <td>SOMETHING1</td>
        <td>SOMETHING2</td>
        <td>SOMETHING3</td>
      </tr>
      <tr>
        <td>SOMETHING1</td>
        <td>SOMETHING2</td>
        <td>SOMETHING3</td>
      </tr>
      <tr>
        <td>SOMETHING1</td>
        <td>SOMETHING2</td>
        <td>SOMETHING3</td>
      </tr>

    </table>
    <br>
    </div>
    <a href="javascript:;" onmousedown="toggleDiv('20161023');">Sunday 23 ...   </a></h3>
    <br>
    <div class="time_div" id="20161023" style="display:none">
    <p class="dep_parag">Test automation on Sunday 23 October</p>
    <table id="table" border="1" cellpadding="3" cellspacing="0">

    <tr>
        <td>SOMETHING1</td>
        <td>SOMETHING2</td>
        <td>SOMETHING3</td>
      </tr>
      <tr>
        <td>SOMETHING1</td>
        <td>SOMETHING2</td>
        <td>SOMETHING3</td>
      </tr>
      <tr>
        <td>SOMETHING1</td>
        <td>SOMETHING2</td>
        <td>SOMETHING3</td>
      </tr>

As you can see there is a list of table row divided by a section with some javascript (the section start with and finish with )

This is just an extraction of a html page containing more than 300.000 table row!

I've to delete the section with the javascript, beacuse i need just a long table row list, clean, without nothing between them.

Instead of doing it manually, that is crazy, i would like something (Regular expression) to clean the file with just one click (I use to run simple regular expression on NOTEPAD++, but this one is a little bit hard for me)

I was thinking at something like:

delete all the row from to cellspacing="0">

Or

delete all the row from and following 8 lines.

Can someone be so gentle to help me with this regex?

Thanks a lot! :)

Upvotes: 0

Views: 131

Answers (3)

ardavey
ardavey

Reputation: 161

Assuming that you are not fussed about irregular whitespace, how about a search pattern of:

</table>.*?<table.*?>

With an empty "Replace with" string, tick the "Regular expression" and ". matches newline" options.

Upvotes: 2

Vijay Wilson
Vijay Wilson

Reputation: 516

This regular expression will work with flag s single-line for php,python, for java initiate expression with DOTALL option

\<\/table\>.+?(?=javascript\:\;).+?(?=\<table.+?cellspacing\=\"0\"\>)\<table.+?cellspacing\=\"0\"\>

Upvotes: 1

Ben
Ben

Reputation: 1291

I don't quite understand which part do you want to remove (my understanding is from </table> to cellspacing="0"> ? ), but it should be fairly easy. Do you mean something like this ?

<a href="javascript([^\n]+\r\n){5}

Upvotes: 1

Related Questions