Retrieve text from html file in java

I want to get text from html file in java

My html file is:

<body>

<p>vishal</p>
<strong>patel</strong>
<bold >vishal patel

I want to output like this

vishal 

patel

vishal patel

How to do this please help me

Upvotes: 9

Views: 18757

Answers (2)

user1082916
user1082916

Reputation:

Better to use html Parser....I prefer to use JSoup parser(opensource package)....

import org.jsoup.Jsoup;
public class HTMLUtils {

    public static String extractText(Reader reader) throws IOException {
        StringBuilder sb = new StringBuilder();
        BufferedReader br = new BufferedReader(reader);
        String line;
        while ((line = br.readLine()) != null) {
            sb.append(line);
        }
        String textOnly = Jsoup.parse(sb.toString()).text();
        return textOnly;
    }

    public final static void main(String[] args) throws Exception {
        FileReader reader = new FileReader("C:/RealHowTo/topics/java-language.html");
        System.out.println(HTMLUtils.extractText(reader));
    }
}

Upvotes: 4

Rakesh
Rakesh

Reputation: 4334

I have used a library called JSoup.
It's very simple to retrieve the text-only part from a HTML file.
It's very simple:

Jsoup.parse(html).text();

gives you the text from the HTML file

Upvotes: 23

Related Questions