Reputation: 37
I have a html table structure with some data in the main table and some in the nested table inside a td element.
I just want the required 5 data (with ** xx ** indication) so I can export it to Excel as one single row.
<table cellpadding="2" cellspacing="0" width="100%" class="chart">
<tr>
<td>**Text 1**</td>
<td>
<table cellpadding="2" cellspacing="0">
<tr>
<td>some useless data</td>
<td>**Text 2**</td>
</tr>
</table>
</td>
<td>**Text 3**</td>
<td>**Text 4**</td>
<td>**Text 5**</td>
</tr>
</table>
My Code is like this:
for (Element row : excel.select("tr")) {
// create row for each tag
header = sheet.createRow(rowCount);
// loop through all th tag
Elements ths = row.select("th");
int count = 0;
for (Element element : ths) {
// set header style
cell = header.createCell(count);
cell.setCellValue(element.text());
cell.setCellStyle(headerStyle);
count++;
}
// now loop through all td tag
Elements tds = row.select("td");
count = 0;
for (Element element : tds) {
if(!element.text().isEmpty()){
cell = header.createCell(count);
cell.setCellValue(element.text());
count++;
}
}
The problem here is that the output was not as expected.
It looks like this in Excel:
Row1: Text 1 | Text 2 | useless data | Text 2 | Text 3 | Text 4 | Text 5 |
Row2: useless data | Text 2 |
Additional Information: tags are omitted for simplifying question.
What I want is
Row1: Text 1 | Text 2 | Text 3 | Text 4 | Text 5 |
Upvotes: 1
Views: 1922
Reputation: 11712
1. Two rows
I guess excel
is the document or the table. Anyway, when you select
excel.select("tr")
you also pick up the inner table tr
. To prevent this, you need to make the css selector more specific. If I assume excel
to be the Document, I can do this
Elements outerTrs = excel.select("table.chart>tbody>tr");
in the context of your code:
for (Element row : excel.select("table.chart>tbody>tr")) {
Explanation:
Jsoup creates a tbody
element inside a table if it is not present. With the selector I made sure only the direct child tr
the elements of the outer table are selected I can do this, because I know the class name of the outer table and it seems unique.
2. Unexpected number of columns
This is due to the fact that your select row.select("td")
statement picks up the td
containing the inner table. if you want only tds with no child elements you could use this:
Elements tds = row.select("td");
count = 0;
for (Element element : tds) {
if(!element.text().isEmpty() && element.children().isEmpty()){
count++;
System.out.println("line "+count+" text = '"+element.text()+"'");
}
3. useless data
To get rid of this, you need to just filter it out. From your example it is not clear when useless data is present. Is it always the first td
in the inner table? If so you can do this (full solution)
Document excel = Jsoup.parse(tab);
for (Element row : excel.select("table.chart>tbody>tr")) {
Elements tds = row.select("td");
int count = 0;
Element junkTd = row.select("td table td").first();
for (Element element : tds) {
if(!element.text().isEmpty()
&& element.children().isEmpty()
&& !element.equals(junkTd)){
count++;
System.out.println("line "+count+" text = '"+element.text()+"'");
}
}
}
Upvotes: 1