Jh62
Jh62

Reputation: 334

HtmlUnit to click on specific link from link's reference with same name

I started using HtmlUnit today, so I'm a bit noob at the time.

I've managed to to go to IMDB and search for the movie "Sleepers" from 1996, and I get a bunch of results with the same name:

Here are the results from that search

I want to select the first "Sleepers" from the list, which is the correct one, but I don't know how to get that information with HtmlUnit. I looked inside the code and found the link, but I don't know how to extract it.

I guess i could use some regex, but that would defeat the purpose of using HtmlUnit.

This is my code (It has some bits from HtmlUnit's tutorial and some code found here):

public IMdB() {
    try {
        //final WebClient webClient = new WebClient();

        final WebClient webClient = new WebClient(BrowserVersion.INTERNET_EXPLORER_8, "10.255.10.34", 8080);

        //set proxy username and password 
        final DefaultCredentialsProvider credentialsProvider = (DefaultCredentialsProvider) webClient.getCredentialsProvider();
        credentialsProvider.addCredentials("xxxx", "xxxx");

        // Get the first page
        final HtmlPage page1 = webClient.getPage("http://www.imdb.com");

        // Get the form that we are dealing with and within that form, 
        // find the submit button and the field that we want to change.
        //final HtmlForm form = page1.getFormByName("navbar-form");
        HtmlForm form = page1.getFirstByXPath("//form[@id='navbar-form']");

        //
        HtmlButton button = form.getFirstByXPath("/html/body//form//button[@id='navbar-submit-button']");            
        HtmlTextInput textField = form.getFirstByXPath("/html/body//form//input[@id='navbar-query']");

        // Change the value of the text field
        textField.setValueAttribute("Sleepers");

        // Now submit the form by clicking the button and get back the second page.
        HtmlPage page2 = button.click();

       // form = page2.getElementByName("s");

        //page2 = page2.getFirstByXPath("/html/body//form//div//tr[@href]");

        System.out.println("content: " + page2.asText());

        webClient.closeAllWindows();
    } catch (IOException ex) {
        Logger.getLogger(IMdB.class.getName()).log(Level.SEVERE, null, ex);
    }

    System.out.println("END");
}

Upvotes: 1

Views: 1969

Answers (2)

Mosty Mostacho
Mosty Mostacho

Reputation: 43434

You should do that this way:

HtmlPage htmlPage = new WebClient().getPage("http://imdb.com/blah");
HtmlAnchor anchor = htmlPage.getFirstByXPath("//td[@class='primary_photo']//a")
System.out.println(anchor.getHrefAttribute());

Upvotes: 1

dirtydexter
dirtydexter

Reputation: 1073

I would suggest you to rather use the IMDB api then doing all that

The IMDb currently has two public APIs that are, although undocumented, very quick and reliable (used on their own site through AJAX).

  1. A statically cached search suggestions API:

  2. More advanced search

Upvotes: 0

Related Questions