Extracting name of an image from an src attribute (url)

Question

I'm trying to scrape images using JSoup and don't understand a piece of code that I stumbled upon.

Part of the code: (src in this case is defined as an absolute url)

private static void getImages(String src) throws IOException {

    String folder = null;

    //Exctract the name of the image from the src attribute
    int indexname = src.lastIndexOf("/");

    if (indexname == src.length()) { // Don't understand this
        src = src.substring(1, indexname);
    }

    indexname = src.lastIndexOf("/");
    String name = src.substring(indexname, src.length());

    // more code
}

I don't understand the if statement. More specifically, when will indexname ever equal the length of src?

RealSkeptic · Accepted Answer

Don't assume every source you find on the internet is good.

That piece of code has many problems.

Indeed, the only case where the result of String.lastIndexOf is the length of the source string is when the search string is "". So that if block is never executed.
The operation inside that if block (delete the first character of the string) is not really helpful.
It is perfectly legal to add slashes to a URL even after the image name. Try adding '?/' to an image name in a URL.
It is also perfectly legal not to have an image name at all. There could be the name of a script with parameters there, such as "http://example.com/generate-captcha.php?param1=foo¶m2=bar" (not a real link, just an example).
You could even have nothing at all after the domain name.

Since there is no law that says that a URL actually has to have a file name after the last slash, or that the file name has to be the name of the actual image, then this code works only part of the time.

Extracting name of an image from an src attribute (url)

Answers (1)

Related Questions