Alessandro Nardi
Alessandro Nardi

Reputation: 9

How to extract text from html page?

How to extract text from html page? For example the web page is the link http://www.atempodihockey.it/campionati/campionati-hil/serie-a1-2013-2014/calendario.html from I want to take the text. I must have the name of the team and the resoult of the match

Upvotes: 1

Views: 772

Answers (2)

PaulShovan
PaulShovan

Reputation: 2134

For this purpose, you can use HtmlAgilityPack

Do it as follwing...

Add reference of HtmlAgilityPack in your project.

using HtmlAgilityPack;

and then put the url to get the full page

HtmlWeb webGet = new HtmlWeb();
HtmlDocument document = webGet.Load("http://www.atempodihockey.it/campionati/campionati-hil/serie-a1-2013-2014/calendario.html");

From the html of 'document' variable you can get your expected text

Upvotes: 0

Hardik Nadiyapara
Hardik Nadiyapara

Reputation: 2436

I think below code can help u

webView = (WebView) findViewById(R.id.webterms);
        webView.getSettings().setJavaScriptEnabled(true);
        webView.getSettings().setPluginsEnabled(true);
        webView.getSettings()
                .setUserAgentString(
                        "Mozilla/5.0 (Linux; U; Android 2.0; en-us; Droid Build/ESD20) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 Mobile Safari/530.17");

after creating your webview load your url or html page

webView.addJavascriptInterface(new MyJavaScriptInterface(),"HTMLOUT");
            webView.setWebViewClient(new WebViewClient() {

                @Override
                public boolean shouldOverrideUrlLoading(WebView view, String url) {
                    view.loadUrl(url);
                    return false;
                }

                @Override
                public void onPageFinished(WebView view, String url1) {
                    if (pDialog.isShowing()) {
                        pDialog.dismiss();
                    }
                    webView.loadUrl("javascript:window.HTMLOUT.processHTML(document.documentElement.innerText);");

                }
            });
            webView.loadUrl(url);

Then create a class which has a one method for processing your html

class MyJavaScriptInterface {

        public void processHTML(String html) {
            if (null != html && html.trim().length() > 0) {
                System.out.println("your Html  ->" + html);         
            }
        }

Upvotes: 1

Related Questions