Reputation: 3057
I was working on a maven project that allows me to parse a html data from a website. I was able to parse it using this code below:
public void parseData(){
String url = "http://stackoverflow.com/help/on-topic";
try {
Document doc = Jsoup.connect(url).get();
Element essay = doc.select("div.col-section").first();
String essayText = essay.text();
jTextAreaAdem.setText(essayText);
} catch (IOException ex) {
Logger.getLogger(formAdem.class.getName()).log(Level.SEVERE, null, ex);
}
}
So far I have no problems. I can parse the html data. I was using select method from jsoup and retrieving data using "div.col-section" which means I'm looking for div element with the class is col-section. I wanted to print the data in a textarea. The result that I have is a huge one paragraph even though the real data on the website is more than one paragraphs. So how to parse the data just like the one on the website?
Upvotes: 5
Views: 12469
Reputation: 10522
The reason that it is not formatted is that the formatting is in the HTML -- with <p>
and <ol>
tags etc. Calling .text()
on a block element loses that formatting.
Jsoup has an example HTML to Plain Text convertor which you can adapt to your needs -- by providing the div element as the focus.
Alternatively, you could just select "div.col-section > *"
, and iterate through each Element, and print out that text with a newline.
Upvotes: 5