Mukesh Rajput
Mukesh Rajput

Reputation: 755

Remove table rows for given pattern in Java String

I want to remove all the rows having N/A as value is the last column of given html code (java string).

Please help me in getting correct regex/pattern code to remove all occurrences:

<table class="overviewTable">
    <tr>
    <th colspan="6" class="header suite">
      <div class="suiteLinks">
                                        <a href="suite1_groups.html">Groups</a>
              </div>
      Test Automation
    </th>
  </tr>
  <tr class="columnHeadings">
    <td>&nbsp;</td>
    <th>Duration</th>
    <th>Passed</th>
    <th>Skipped</th>
    <th>Failed</th>
    <th>Pass Rate</th>
  </tr>
    
    <tr class="test">
    <td class="test">
      <a href="suite1_test14_results.html">Test Xyz</a>
    </td>
    <td class="duration">
      0.000s
    </td>

        <td class="zero number">0</td>
    
        <td class="zero number">0</td>
    
        <td class="zero number">0</td>
    
    <td class="passRate">
            N/A
          </td>
  </tr>
    
    <tr class="test">
    <td class="test">
      <a href="suite1_test15_results.html">Test abc XYZ</a>
    </td>
    <td class="duration">
      0.000s
    </td>

        <td class="zero number">0</td>
    
        <td class="zero number">0</td>
    
        <td class="zero number">0</td>
    
    <td class="passRate">
            N/A
          </td>
  </tr>
      
    <tr class="test">
    <td class="test">
      <a href="suite1_test17_results.html">TestAbcSuccess</a>
    </td>
    <td class="duration">
      77.582s
    </td>

        <td class="passed number">1</td>
    
        <td class="zero number">0</td>
    
        <td class="zero number">0</td>
    
    <td class="passRate">
            100%
          </td>
  </tr>
    
    <tr class="suite">
    <td colspan="2" class="totalLabel">Total</td>

        <td class="passed number">1</td>
    
        <td class="zero number">0</td>
    
        <td class="zero number">0</td>
    
    <td class="passRate suite">
            100%
          </td>

  </tr>
</table>

This is index.html file of Java+Selenium+TestNG automation results.

Please help me in getting correct regex/pattern code to remove all occurrences from the above HTML.

These are my trials:

1.

fullHtmlStr = fullHtmlStr.replaceAll("(?<=<tr class=\"test\">).*?(?=N/A\n          </td>)", "");

2.

Pattern PATTERN = Pattern.compile("<tr class=\"test\">.*$.N/A\n          </td>", Pattern.MULTILINE | Pattern.DOTALL );
Matcher m = PATTERN.matcher(fullHtmlStr);
if (m.find())
   fullHtmlStr = m.replaceAll("");

(I don't have any knowledge of regex, so please forgive my incase these are totally wrong)

Attaching screenshots:

Screenshot Before Updation: enter image description here

Screenshot After Updation: enter image description here

Upvotes: 0

Views: 839

Answers (2)

Mukesh Rajput
Mukesh Rajput

Reputation: 755

Based on the suggestion by @little-santi, i have used jsoup html parser to manipulate the code, here is my code:

            Document document = Jsoup.parse(strText);
            for( org.jsoup.nodes.Element element : document.select("td:eq(5)")) {
                String content = element.getElementsMatchingOwnText("N/A").text();
                if(content.equalsIgnoreCase("N/A")) {
                    element = element.parent();
                    element.remove();
                }
            }
            strText = document.toString();

Upvotes: 1

Little Santi
Little Santi

Reputation: 8813

I discourage you to use a regexp to do this matter: Regular expressions are useful to match patterns made of characters, but not patterns made of patterns.

To process an HTML string you need a proper parser: If it is XHTML, you can parse it straightforward through a DocumentBuilder. If not, you need to convert it first to XHTML through opensource library Tidy.

Through a parser you'll convert your HTML string to a Document object, which you shall process to traverse it, add or remove any nodes.

Upvotes: 1

Related Questions