Ashir Ali
Ashir Ali

Reputation: 13

Download Images Using Selenium in C#

I want to download first 10 images from the given Url stated in the following code of google images. I have tried soo much ways but not able to download images because i got the wrong lengthy urls which seems to be junk data.

Here is my code as i am using C#,

var driver = new ChromeDriver();

driver.Navigate().GoToUrl("https://www.google.com/search?q=wallpapers+pics&tbm=isch&ved=2ahUKEwizlL6W6sLxAhUE_4UKHS_YBDEQ2-cCegQIABAA&oq=wallpapers+pics&gs_lcp=CgNpbWcQAzICCAAyAggAMgIIADICCAAyAggAMgIIADICCAAyAggAMgIIADICCAA6BwgAELEDEEM6BAgAEENQ3I8FWJ6YBWDsnQVoAHAAeACAAasCiAHYCZIBAzItNZgBAKABAaoBC2d3cy13aXotaW1nwAEB&sclient=img&ei=bjXeYLOlJ4T-lwSvsJOIAw&bih=802&biw=1707");



// These are commented three ways to select the list of images

//IList<IWebElement> Imghref = driver.FindElements(By.XPath("//img[@jsname]"));

//IList<IWebElement> Imghref = driver.FindElements(By.ClassName("rg_i"));

//IList<IWebElement> Imghref = driver.FindElements(By.TagName("img"));

IList<IWebElement> Imghref = driver.FindElements(By.ClassName("rg_i"));

//BcuVif - n3VNCb     --- ClassNames which i have observed

foreach (IWebElement eachLink in Imghref)
{
    eachLink.Click();

    IWebElement Images = driver.FindElement(By.TagName("img"));
    //Console.WriteLine(Images.GetAttribute("class"));
    String ImageUrl = Images.GetAttribute("src");
    string ImageName = Images.GetAttribute("alt");
    Console.WriteLine("Image URL : " + ImageUrl);
    WebClient downloader = new WebClient();
    downloader.DownloadFile(ImageUrl, "D:\\VisualStudio Workspace\\Download-Images\\images\\" + ImageName + ".jpg");

}

Upvotes: 1

Views: 1347

Answers (3)

Muhammad Ashir Ali
Muhammad Ashir Ali

Reputation: 65

So after putting this code in my original code everything is working fine for me! Thanks everyone!

String converted = base64String.Replace("data:image/jpeg;base64,", string.Empty); byte[] bytes = Convert.FromBase64String(converted); File.WriteAllBytes("D:\VisualStudio Workspace\Download-Images\images\image.jpg", bytes);

Upvotes: 0

lidqy
lidqy

Reputation: 2453

In your 1st post you mentioned a Google Images URL and then fetched all elements of class name "rg_i". At least some of the images with this class name (if not all), expose their content as base-64, just as Georgi said. But the format is gif, not jpeg or png, like <img src=""

So you need to carefully parse the beginning of the string value of the src attribute and then either download it using selenium, if it is an URL (and making sure that the filename is valid) or Decoding it as base 64 data and saving the the returned data to a file with a WriteAllBytes or a StreamWriter...

U can check the HTML markup with Ctrl+U in most browsers. Then you see what the img src data is like.

Upvotes: 0

Giorgi Chkhikvadze
Giorgi Chkhikvadze

Reputation: 684

src attribute of img tag is not always URL to image, image content itself may be placed in src attribute. for example consider following example when jpg image is stored directly in src:

<img src="" width="284" height="178">

you should check if src attribute value starts with data: (which means that directly image content is stored instead of url), and then you can parse content. format is following: src="; , ". in this example case there is jpg image stored as base64 string. you can Convert base64 sting and save it to file like that:

var bytes = Convert.FromBase64String(base64String);
File.WriteAllBytes("image.jpg", bytes);

Of course you will need to check image format (jpg, png, etc..) and save decoded content in file with corresponding file extension.

Upvotes: 1

Related Questions