Ethan Guillotte
Ethan Guillotte

Reputation: 37

Element Not Found Exception: HTMLUnit - Search By ID

I am attempting to use HTMLUnit (first time) to extract data from specific pages. Specifically, I am currently trying to grab an HTML element by ID (a search box).

But I am running into:

Exception in thread "main" com.gargoylesoftware.htmlunit.ElementNotFoundException: elementName=[*] attributeName=[id] attributeValue=[space_search_keyword]
    at com.gargoylesoftware.htmlunit.html.HtmlPage.getHtmlElementById(HtmlPage.java:1547)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.getHtmlElementById(HtmlPage.java:1517)
    at Test.main(Test.java:33)

This is my code:

import java.util.List;

import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HTMLParserListener;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class Test {

    public static void main(String[] args) {

        HtmlPage page = null;

        WebClient client = new WebClient();
        client.setCssEnabled(false); 
        client.setJavaScriptEnabled(false);

        try {  
          String searchUrl = "https://25live.collegenet.com/umassd/#space_search[0]";
          page = client.getPage(searchUrl);
        }catch(Exception e){
          e.printStackTrace();
        }


        //System.out.println(page.asXml());
        HtmlElement searchBox = (HtmlElement)page.getHtmlElementById("space_search_keyword");

    }

}

Upon further inspection using the page.asXML(), it seems that the page isn't properly loading and that's why it can't find the item? I'm not sure why it isn't loading for HTMLUnit. There's no need to login, you can see the page come up for yourself by entering it in a browser.

Any help with debugging HTMLUnit issues like this would be greatly appreciated.

Upvotes: 1

Views: 539

Answers (1)

rustyx
rustyx

Reputation: 85541

The site is a SPA (Single-Page Application) written in Angular. You need JavaScript to run it.

Unfortunately the JavaScript capability of HtmlUnit is insufficient to run Angular, so your approach won't work.

You can try:

  • Reverse-engineer the page and fetch the underlying resource that the SPA is accessing
  • Try Selenium ChromeDriver (it actually opens Chrome and simulates button clicks on the page)

Upvotes: 2

Related Questions