Reputation: 755
I want to remove all the rows having N/A as value is the last column of given html code (java string).
Please help me in getting correct regex/pattern code to remove all occurrences:
<table class="overviewTable">
<tr>
<th colspan="6" class="header suite">
<div class="suiteLinks">
<a href="suite1_groups.html">Groups</a>
</div>
Test Automation
</th>
</tr>
<tr class="columnHeadings">
<td> </td>
<th>Duration</th>
<th>Passed</th>
<th>Skipped</th>
<th>Failed</th>
<th>Pass Rate</th>
</tr>
<tr class="test">
<td class="test">
<a href="suite1_test14_results.html">Test Xyz</a>
</td>
<td class="duration">
0.000s
</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="passRate">
N/A
</td>
</tr>
<tr class="test">
<td class="test">
<a href="suite1_test15_results.html">Test abc XYZ</a>
</td>
<td class="duration">
0.000s
</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="passRate">
N/A
</td>
</tr>
<tr class="test">
<td class="test">
<a href="suite1_test17_results.html">TestAbcSuccess</a>
</td>
<td class="duration">
77.582s
</td>
<td class="passed number">1</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="passRate">
100%
</td>
</tr>
<tr class="suite">
<td colspan="2" class="totalLabel">Total</td>
<td class="passed number">1</td>
<td class="zero number">0</td>
<td class="zero number">0</td>
<td class="passRate suite">
100%
</td>
</tr>
</table>
This is index.html file of Java+Selenium+TestNG automation results.
Please help me in getting correct regex/pattern code to remove all occurrences from the above HTML.
These are my trials:
1.
fullHtmlStr = fullHtmlStr.replaceAll("(?<=<tr class=\"test\">).*?(?=N/A\n </td>)", "");
2.
Pattern PATTERN = Pattern.compile("<tr class=\"test\">.*$.N/A\n </td>", Pattern.MULTILINE | Pattern.DOTALL );
Matcher m = PATTERN.matcher(fullHtmlStr);
if (m.find())
fullHtmlStr = m.replaceAll("");
(I don't have any knowledge of regex, so please forgive my incase these are totally wrong)
Attaching screenshots:
Upvotes: 0
Views: 839
Reputation: 755
Based on the suggestion by @little-santi, i have used jsoup html parser to manipulate the code, here is my code:
Document document = Jsoup.parse(strText);
for( org.jsoup.nodes.Element element : document.select("td:eq(5)")) {
String content = element.getElementsMatchingOwnText("N/A").text();
if(content.equalsIgnoreCase("N/A")) {
element = element.parent();
element.remove();
}
}
strText = document.toString();
Upvotes: 1
Reputation: 8813
I discourage you to use a regexp to do this matter: Regular expressions are useful to match patterns made of characters, but not patterns made of patterns.
To process an HTML string you need a proper parser: If it is XHTML, you can parse it straightforward through a DocumentBuilder. If not, you need to convert it first to XHTML through opensource library Tidy.
Through a parser you'll convert your HTML string to a Document object, which you shall process to traverse it, add or remove any nodes.
Upvotes: 1