prathameshr
prathameshr

Reputation: 59

Get Images from website

I need to download all images from a website to which I have username and password. Say, website url is http://example.co.in/images/Photos/ABC123.jpg in such a way there many images, my requirement is to download all images. What can be done in Java, C++ or any programming language ? Sample code will be helpful. Thanks

Using following code to get image from Google website

import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;

class Test {
public static void main(String args[]) throws Exception {
    System.out.println("Hello World");

    URL url = new URL("http://www.google.co.in/images/google_favicon_128.png");
    InputStream in = new BufferedInputStream(url.openStream());
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    byte[] buf = new byte[1024];
    int n = 0;
    while (-1 != (n = in.read(buf))) {
        out.write(buf, 0, n);
    }
    out.close();
    in.close();
    byte[] response = out.toByteArray();

    FileOutputStream fos = new FileOutputStream("C://ABC//google1.jpg");
    fos.write(response);
    fos.close();

}}

Now I need help where I don't know name of images say all images with extension .jpg (*.jpg) and it should store in my folder as 1.jpg, 2.jpg and so on. So how to get count of number of images and how to access their names in http://www.google.co.in/images/

Upvotes: 2

Views: 23546

Answers (5)

Dan
Dan

Reputation: 1179

Use crawler4j it's pretty easy web crawler library to work with.

On their sample demo shows how to grab urls contained within a page.

 List<WebURL> links = htmlParseData.getOutgoingUrls();

I wouldn't use byte streams to write a file. You could can make a File object from a URI and then figure out how to write a file to disk.

File imageFile = new File( new URL("www.foo.someImage.jpg").toURI() );

Another possible solution is with ImageIO()

URL url = new URL("www.foo.someImage.jpg");

Image image = ImageIO.read(url);

if(image != null){
   ImageIO.write(image, "jpg",new File("file path to save");
}

Upvotes: -1

amit
amit

Reputation: 178451

You are looking for a crawler. I recently used scrapy, and it is fairly simple to use.

All you have to do is create your Spider. It needs to generate Request objects for each url you are encountering, and you can download the image in the page using an extra HTTP request to the image.

Follow the tutorial, it is really simple and leads you step by step, explaining you how to build your crawler, and providing a code example as well.

Note that scrapy works with python.

There is also crawl4j for java. I am not personally familiar with it, but I have heard it is very simple one as well.

Upvotes: 6

Jeroen Vannevel
Jeroen Vannevel

Reputation: 44439

You're looking for a crawler. If you're going for Java, I would recommend using Jsoup to parse the HTML from websites. Combined with commons.io you should be able to do this fairly easy.

If you want a full code example, you can take a look at a recent project I made that does exactly this.

Basically all you have to do is load all links from the src attribute of img tags and then download these images. You can provide extra parameters like image type or size, but that's up to your own discretion.

Upvotes: 0

Jayamohan
Jayamohan

Reputation: 12924

Looks like you are comfortable in Java, you can try using HtmlUnit.

They have HtmlPage.getElementsByTagName("img") method, which will give you the list of image tags in the HTML. Based on the result you can get the image src location and download the image from the URL using your test method.

You can also use other HTML parsers like SimpleHTML Parser, JTidy etc for parsing HTML to get the image tag.

Upvotes: 2

MrSmith42
MrSmith42

Reputation: 10151

If you are a Linux user, have a look at the command line tool wget

As a Windows user you might use it via Cygwin (http://www.cygwin.com/)

Upvotes: 0

Related Questions