Reputation: 161
I want to get text from html file in java
My html
file is:
<body>
<p>vishal</p>
<strong>patel</strong>
<bold >vishal patel
I want to output like this
vishal
patel
vishal patel
How to do this please help me
Upvotes: 9
Views: 18757
Reputation:
Better to use html Parser....I prefer to use JSoup parser(opensource package)....
import org.jsoup.Jsoup;
public class HTMLUtils {
public static String extractText(Reader reader) throws IOException {
StringBuilder sb = new StringBuilder();
BufferedReader br = new BufferedReader(reader);
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
String textOnly = Jsoup.parse(sb.toString()).text();
return textOnly;
}
public final static void main(String[] args) throws Exception {
FileReader reader = new FileReader("C:/RealHowTo/topics/java-language.html");
System.out.println(HTMLUtils.extractText(reader));
}
}
Upvotes: 4