Reputation: 279
Here is my code. When output printing that print white space between paragraphs also. How can I remove white spaces between paragraphs and then I want to store sentence by sentence in array list.
public static void main(String[] args) {
try {
String url = "http://www.divaina.com/";
System.setProperty("http.proxyHost", "cache.mrt.ac.lk");
System.setProperty("http.proxyPort", "3128");
Document doc = Jsoup.connect(url).timeout(10000).get();
Elements paragraphs = doc.select("p");
for(Element p : paragraphs){
System.out.println(p.text());}
}
catch (IOException ex) {
ex.printStackTrace();
}
}
When I'm directly adding content into database white spaces also adding it. How can I remove those white spaces between paragraphs? Actually I want to read content of web page and line by line adding to the database. Is there any other proper way to do it?
Upvotes: 0
Views: 1003
Reputation: 1667
Obviously some of paragraphs contain no text. This might help:
for (Element p : paragraphs)
{
if (p.text().length() != 0)
System.out.println(p.text());
}
Upvotes: 1
Reputation: 1055
Use regular expression:
String withoutspace = whitespace.replaceAll("\\s", "");
Or try this
String withoutSpace = whitespace.replace("\n", "").replace("\r", "");
Upvotes: 0