user3381831
user3381831

Reputation: 29

Using Jsoup to get data from html source code

I need to access a url and pull some information from it. I am using Android Studio. I have code that does not throw any errors, but it is displaying no information. I believe the problem is probably that I am searching for the wrong parameter with my .select statement. Please keep in mind that I am very new to java/android development. Here is my code:

private class FetchAnton extends AsyncTask<Void, Void, Void> {

    String price;
    String url = "http://www.antoncoop.com/markets/cash.php";


    @Override
    protected Void doInBackground(Void... params) {
        try {

            Document document = Jsoup.connect(url).get();                     
            price = String.valueOf(document.select("quotes['KEH15']"));

        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    @Override
    protected void onPostExecute(Void result) {

        TextView priceTextView = (TextView) findViewById(R.id.priceTextView);
        priceTextView.setText(price);

    }

}

And here is the HTML section that the "quotes['KEH15']" refers to (scroll to the right):

</thead>
            <tbody>
                    <script language="javascript">

                        writeBidRow('Wheat',-60,false,false,false,0.5,'01/15/2015','02/26/2015','All','&nbsp;','&nbsp;',60,'even','c=2246&l=3519&d=G15',quotes['KEH15'], 0-0);
                        writeBidRow('Wheat',-65,false,false,false,0.5,'07/01/2015','07/31/2015','All','&nbsp;','&nbsp;',60,'odd','c=2246&l=3519&d=N15',quotes['KEN15'], 0-0);
                </script>

I need to get the value that is represents the "quotes['KEH15']" slot of the html into the string called price. When I run the program, my txt view changes from the default string into a blank. So I think the code is working, but the text view is being updated with a blank string. Can anyone please help me fix this problem?

Thank you for your help.

Keith

Upvotes: 0

Views: 1512

Answers (1)

Alkis Kalogeris
Alkis Kalogeris

Reputation: 17745

As @njzk2 mentioned you need a javascript engine to do that. Let me elaborate (since you are a beginner I'm going to keep it painfully detailed here). Jsoup is just a parser. What this means is

  • It will make an HTTP call to the url you provided and will retrieve a response, an HTTP response. This response, among some other things (headers etc, read more on HTTP if you want details), will include the HTML you are after.
  • It will generate a structured representation of that HTML by creating appropriate java objects that give you all those nice features that you read about in the tutorial (css selectors and such)

As it was mentioned earlier Jsoup is just a parser. It retrieves information, nothing more. Which means it can't execute code to produce new HTML pieces. Here is an experiment. Visit a url (facebook, gmail, stackoverflow, whatever works for you, but you are certain that has a lot of js behind it). When you are in that page press Ctrl+U with Chrome. It will open a new tab. This tab shows you exactly what HTML was received from the server, before any javascript was executed and produced new HTML (like the notifications you get on facebook when you have a message). Now go back to the page and press F12 instead. It will open the development tools. Here you are going to see something different. This is the actual HTML rendered by the browser. When you are using Jsoup, then what your program has available is the first HTML, the one before any javascript is executed and that's because Jsoup can't execute javascript, because is just a parser. It's not a browser. A browser can render the additional content, because it can execute javascript code, because it has a javascript engine.

There are two options for you.

  1. If the javascript you want to execute is something simple, and it doesn't do any "complex" DOM manipulation, it just generates some string or whatnot then I suppose you could use ScriptEngine that can be found in Java 7 and it can handle the execution of javascript. Mind you, it's javascript, not jQuery. ScriptEngine is not a browser. Check a tutorial to see what you can accomplish in greater detail.
  2. If ScriptEngine is lacking then you are left with a headless browser (a browser without GUI). A headless browser is a browser for automated tasks. Check selenium webdriver. They are used heavily in testing of web applications, sites etc. I don't know if you can use it in your android application though. It is big enough (which is perfectly normal, since it offers an awful lot) and has some dependencies that, I believe, do not play well with android (same classes different implementation etc). Anyway, I haven't done it, so I'm not 100% certain about this. You have to check it out yourself. Although you could make a web application, that does all the parsing, and it exposes a web service for your app to use.

Upvotes: 2

Related Questions