Aldridge1991
Aldridge1991

Reputation: 1367

Parse HTML text in Android

I'm trying to parse some HTML in my Android app and I need to get the text:

Pan Artesano Elaborado por Panadería La Constancia. ¡Esta Buenísimo!

in

enter image description here

Is there any easy way to get only the text and remove all html tags?

The behavior that I need is exactly the one shown in this PHP code http://php.net/manual/es/function.strip-tags.php

Upvotes: 1

Views: 1392

Answers (3)

Martin
Martin

Reputation: 650

Firstly get HTML code with

HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);

String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
    str.append(line);
}
in.close();
html = str.toString();

then I recommend to create custom tag in HTML such as <toAndroid></toAndroid> and then you can get text with

String result = html.substring(html.indexOf("<toAndroid>", html.indexOf("</toAndroid>")));

your html for example

<toAndroid>Hello world!</toAndroid>

will result

Hello world!

Note that you can place <p> into <toAndroid> tags and then remove it in Java from result.

Upvotes: -1

user8579885
user8579885

Reputation:

Well when you want just to show it, then webview would help you, just set that string to webview and you got it.

When you would to use it elsewhere then i am to stupid for that :D.

 String data = "your html here";
        WebView webview= (WebView)this.findViewById(R.id.webview);
        webview.getSettings().setJavaScriptEnabled(true);
        webview.loadDataWithBaseURL("", data, "text/html", "UTF-8", "");

also you can pass just web URL webview.loadDataWithBaseURL("url","","text/html", "UTF-8", "");

Upvotes: 0

Nirav Joshi
Nirav Joshi

Reputation: 1723

Document doc = Jsoup.parse(html);
Element content = doc.getElementById("someid");
Elements p= content.getElementsByTag("p");

String pConcatenated="";
for (Element x: p) {
  pConcatenated+= x.text();
}

System.out.println(pConcatenated);//sometext another p tag

Upvotes: 2

Related Questions