Reputation: 115
I've been using Jsoup for HTML parsing, but I encountered a big problem. It takes too long like 1 hour.
Here's the site that I am parsing.
<tr>
<td class="class1">value1 </td>
<td class="class1">value2</td>
<td class="class1">value3</td>
<td class="class1">value4</td>
<td class="class1">value5 </td>
<td class="class1">value6</td>
<td class="class1">value7</td>
<td class="class1">value8</td>
<td class="class1">value9</td>
</tr>
In the site, there are thousands of tables like this, and I need to parse them all to a list. I only need value1 and value6, so to do that I am using this code.
Document doc = Jsoup.connect(url).get();
ls = new LinkedList();
for(int i = 15; i<doc.text().length(); i++) {//15 because the tables I want starting from 15
Element element = doc.getElementsByTag("tr").get(i);//table index
Elements row = element.getElementsByTag("td");
value6 = row.get(5).text();//getting value6
value1 = row.get(0).text();//getting value1
node = new Node(value1, value6);
ls.insert(node);
As I said it takes too much time, so I need to do it faster. Any ideas how to fix this problem ?
Upvotes: 0
Views: 242
Reputation: 11712
I think your problem stems from the for loop for(int i = 15; i<doc.text().length(); i++)
. What you do here is loop over the whole text of the document character by character. I highly doubt that this is what you want to do. I think you want to cycle over the table rows instead. So something like this should work:
Document doc = Jsoup.connect(url).get();
Elements trs = doc.select("tr");
for (int i = 15; i < trs.size(); i++){
Element tr = trs.get(i);
Elements tds = tr.select("td").;
String value6 = tds.get(5).text(); //getting value6
String value1 = tds.get(1).text(); //getting value1
//do whatever you need to do with the values
}
Upvotes: 2