Parsing Html content using Jsoup

Question

This is my HTML source

This is my Java program to get the content & it filters the HTML tags

    try {   
        myurl = new URL("http://www.somewebsite.com");  
        HttpURLConnection con= (HttpURLConnection) myurl.openConnection();

        InputStream result = con.getInputStream();
        BufferedReader reader = new BufferedReader(new InputStreamReader(result));
        StringBuilder sb = new StringBuilder();

        for(String line; (line = reader.readLine()) != null;)
            //append all content & separate using line separator
        sb.append(line).append(System.getProperty("line.separator"));
        String final_result = sb.toString().replaceAll("\<.*?\>", "");    

        TextView tv=(TextView) findViewById(R.id.textView1); 
        tv.setText(final_result);


    } 

    catch (Exception e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
        tv.setText("not working");
    }

Is there an easier way using Jsoup to parse the HTML content using Java instead of Regex

Is there a way to get only the required contents. So here I just want the contents "Item 2 - 222"

Jeeshu Mittal · Accepted Answer

Try this for easy parsing using jsoup:

// To parse the html page
Document doc = Jsoup.connect("http://www.website.com").get();
Document doc1 = Jsoup.parse("First parse" + " Parsed HTML into a doc.");

String content = doc.body().text();

// To get specific elements such as links
Element links = doc.select("a[href]");
for(Element e: links){
    System.out.println("link: " + e.attr("abs:href"));
}

To learn more, visit Jsoup Docs

Parsing Html content using Jsoup

Answers (1)

Related Questions