Vernon
Vernon

Reputation: 421

How do I cause selenium driving htmlunit to automatically download images?

I found these three potential answers, but they all use the HtmlUnit api. How can I avoid using the HtmlUnit api and only use selenium or some configuration for the browser setup?

Upvotes: 0

Views: 441

Answers (2)

Ahmed Ashour
Ahmed Ashour

Reputation: 5549

This is now part of HtmlUnit 2.25-snapshot, webClient.getOptions().setDownloadImages(true).

And in HtmlUnit-Driver 2.25-snapshot by capability DOWNLOAD_IMAGES_CAPABILITY or htmlUnitDriver.setDownloadImages(true).

Upvotes: 3

stiemannkj1
stiemannkj1

Reputation: 4549

As far as I know, there is no way to automatically download all images with HtmlUnit (either with or without Selenium). As the links you posted indicate, you can force HtmlUnit to download all the images on the page with the following code:

DomNodeList<DomElement> imageElements = htmlPage.getElementsByTagName("img");

for (DomElement imageElement : imageElements) {

    HtmlImage htmlImage = (HtmlImage) imageElement;

    try {

        // Download the image.
        htmlImage.getImageReader();
    }
    catch (IOException e) {
        // do nothing.
    }
}

However, getting the current page when using Selenium HtmlUnitDriver is not trivial. There are multiple ways to do it, but all of them require access to the protected HtmlUnitDriver.lastPage() method. One way to access this method is through reflection. Another solution is to take advantage of the fact that protected methods are also accessible by classes in the same package and packages can be the same across jars. Combining the latter features/design flaws, I was able to come up with a solution that avoids reflection. Instead it simply adds a normal class to the same package as HtmlUnitDriver---org.openqa.selenium.htmlunit.

package org.openqa.selenium.htmlunit;

import java.io.IOException;

import com.gargoylesoftware.htmlunit.html.DomElement;
import com.gargoylesoftware.htmlunit.html.DomNodeList;
import com.gargoylesoftware.htmlunit.html.HtmlImage;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class HtmlUnitUtil {

    private HtmlUnitUtil() {
        throw new AssertionError();
    }

    public static void loadImages(HtmlUnitDriver htmlUnitDriver) {

        // Since we are in the same package (org.openqa.selenium.htmlunit)
        // as HtmlUnitDriver, we can access HtmlUnitDriver's protected
        // lastPage() method.
        HtmlPage htmlPage = (HtmlPage) htmlUnitDriver.lastPage();
        DomNodeList<DomElement> imageElements =
            htmlPage.getElementsByTagName("img");

        for (DomElement imageElement : imageElements) {

            HtmlImage htmlImage = (HtmlImage) imageElement;

            try {

                // Download the image.
                htmlImage.getImageReader();
            }
            catch (IOException e) {
                // do nothing.
            }
        }
    }
}

Unfortunately, you will need to manually call this code each time you want images to be loaded. I have created a feature request (htmlunit-driver #40) for HtmlUnitDriver to add an option to automatically download images. Please vote for that issue if you want to see this feature.

Upvotes: 2

Related Questions