Reputation: 9
How to extract text from html page? For example the web page is the link http://www.atempodihockey.it/campionati/campionati-hil/serie-a1-2013-2014/calendario.html from I want to take the text. I must have the name of the team and the resoult of the match
Upvotes: 1
Views: 772
Reputation: 2134
For this purpose, you can use HtmlAgilityPack
Do it as follwing...
Add reference of HtmlAgilityPack in your project.
using HtmlAgilityPack;
and then put the url to get the full page
HtmlWeb webGet = new HtmlWeb();
HtmlDocument document = webGet.Load("http://www.atempodihockey.it/campionati/campionati-hil/serie-a1-2013-2014/calendario.html");
From the html of 'document' variable you can get your expected text
Upvotes: 0
Reputation: 2436
I think below code can help u
webView = (WebView) findViewById(R.id.webterms);
webView.getSettings().setJavaScriptEnabled(true);
webView.getSettings().setPluginsEnabled(true);
webView.getSettings()
.setUserAgentString(
"Mozilla/5.0 (Linux; U; Android 2.0; en-us; Droid Build/ESD20) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 Mobile Safari/530.17");
after creating your webview load your url or html page
webView.addJavascriptInterface(new MyJavaScriptInterface(),"HTMLOUT");
webView.setWebViewClient(new WebViewClient() {
@Override
public boolean shouldOverrideUrlLoading(WebView view, String url) {
view.loadUrl(url);
return false;
}
@Override
public void onPageFinished(WebView view, String url1) {
if (pDialog.isShowing()) {
pDialog.dismiss();
}
webView.loadUrl("javascript:window.HTMLOUT.processHTML(document.documentElement.innerText);");
}
});
webView.loadUrl(url);
Then create a class which has a one method for processing your html
class MyJavaScriptInterface {
public void processHTML(String html) {
if (null != html && html.trim().length() > 0) {
System.out.println("your Html ->" + html);
}
}
Upvotes: 1