Reputation: 37
I am attempting to use HTMLUnit (first time) to extract data from specific pages. Specifically, I am currently trying to grab an HTML element by ID (a search box).
But I am running into:
Exception in thread "main" com.gargoylesoftware.htmlunit.ElementNotFoundException: elementName=[*] attributeName=[id] attributeValue=[space_search_keyword]
at com.gargoylesoftware.htmlunit.html.HtmlPage.getHtmlElementById(HtmlPage.java:1547)
at com.gargoylesoftware.htmlunit.html.HtmlPage.getHtmlElementById(HtmlPage.java:1517)
at Test.main(Test.java:33)
This is my code:
import java.util.List;
import com.gargoylesoftware.htmlunit.BrowserVersion;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HTMLParserListener;
import com.gargoylesoftware.htmlunit.html.HtmlElement;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
public class Test {
public static void main(String[] args) {
HtmlPage page = null;
WebClient client = new WebClient();
client.setCssEnabled(false);
client.setJavaScriptEnabled(false);
try {
String searchUrl = "https://25live.collegenet.com/umassd/#space_search[0]";
page = client.getPage(searchUrl);
}catch(Exception e){
e.printStackTrace();
}
//System.out.println(page.asXml());
HtmlElement searchBox = (HtmlElement)page.getHtmlElementById("space_search_keyword");
}
}
Upon further inspection using the page.asXML(), it seems that the page isn't properly loading and that's why it can't find the item? I'm not sure why it isn't loading for HTMLUnit. There's no need to login, you can see the page come up for yourself by entering it in a browser.
Any help with debugging HTMLUnit issues like this would be greatly appreciated.
Upvotes: 1
Views: 539
Reputation: 85541
The site is a SPA (Single-Page Application) written in Angular. You need JavaScript to run it.
Unfortunately the JavaScript capability of HtmlUnit is insufficient to run Angular, so your approach won't work.
You can try:
Upvotes: 2