Burschken
Burschken

Reputation: 87

Scraping with HtmlUnit in Java (How to find the elements)

I need to get about 70 documents from www.genios.de . Every document has its own link and you have to login in to the WebSite to get access to the documents.

While I could to this manually I want to do this in Java just to learn to code better.

I found HtmlUnit, which seems to provide all the methods I need. My problem is, that I'm not able to get the TextFields for username/password and the button to login.

I tried different ways but none of them works. One attempt was the following code :

    final WebClient webClient = new WebClient();
    final HtmlPage page1 = webClient.getPage("http://www.genios.de");
    final List<HtmlForm> forms =  (List<HtmlForm>) page1.getForms();
    final HtmlForm form = forms.get(0);
    HtmlInput usernameInput = form.getInputByName("loginBlock_username"); 

Resulting in:

Exception in thread "main" com.gargoylesoftware.htmlunit.ElementNotFoundException: elementName=[input] attributeName=[name] attributeValue=[loginBlock_username]
    at    com.gargoylesoftware.htmlunit.html.HtmlForm.getInputByName(HtmlForm.java:469)
    at GeniosLogin.main(GeniosLogin.java:26)

Upvotes: 1

Views: 1337

Answers (1)

James McGuigan
James McGuigan

Reputation: 107

The id of the field is 'loginBlock_username'. The actual name of the field you are trying to get is 'loginBlock.username'

Upvotes: 1

Related Questions