Gob
Gob

Reputation: 73

Java HtmlUnit webpage scraper newPage not accessible

I'm writing a scraper for page dscan.me. Its supposed to fill form with content and submit with submit input button. I don't see any problem here, but Ive tried all I know about HtmlUnit(and that's not too much). Fire submit event, Executing javascripts and getting new page from result... Nothing worked. I will be glad if anybody with more experiences will post here working solution.

This is how I'm getting the controls and setting data in textArea

HtmlForm form = page.getForms().get(0);
HtmlTextArea textArea = form.getTextAreaByName("scandata");
HtmlSubmitInput button = form.getInputByValue("Submit");

textArea.setText(paste);

I'm sure that I have correct controls and the textArea gets filled, but this just terminates with nullpointer exception on getNewPage() call

ScriptResult scriptResult = button.fireEvent(Event.TYPE_SUBMIT);

WebClientProvider.getSharedClient().waitForBackgroundJavaScript(10000);

HtmlPage res = (HtmlPage) scriptResult.getNewPage();

And this gives me as result page the default page with controls... not the page of processed content

String js_set = "$(\".inputbox\").val(\""+ paste.replaceAll("\n", "\\n").replaceAll("\t", "\\t") +"\");\n";     
String js_submit = "$(\".submitbutton\").click();";         
ScriptResult result = page.executeJavaScript(js_submit);
WebClientProvider.getSharedClient().waitForBackgroundJavaScript(10000);

HtmlPage res = (HtmlPage) scriptResult.getNewPage();

Here is example of data you can paste to dscan.me to see the workflow. If you get an idea or find some solution or workaround, I will be glad for anything. Thank you!

Upvotes: 0

Views: 261

Answers (1)

Tasawer Nawaz
Tasawer Nawaz

Reputation: 925

Some times JS requires time to execute, so you have to wait for execution, best is to retrying for some time until page is not updated (using any condition) here is example of code

HtmlForm form = page.getForms().get(0);
HtmlTextArea textArea = form.getTextAreaByName("scandata");
HtmlSubmitInput button = form.getInputByValue("Submit");
HtmlPage res = button.click();
int input_length = page.getByXPath("//input").size();
int tries = 5; 
while (tries > 0 && input_length < 12) { //you can change number of tries and condition according to your need
    tries--;
    synchronized (page) {
        page.wait(2000); //wait
    }
    input_length = page.getByXPath("//input").size(); //input length is example of condtion
}

Upvotes: 1

Related Questions