Ahmed Sheikh
Ahmed Sheikh

Reputation: 169

I can't get image url scraped from website

I am using this python code to extract cars details. However, I managed to extract them except the image source (URL) which am facing trouble with. You can at my code below:

import requests
from bs4 import BeautifulSoup
URL = "https://www.cars.com/shopping/results/?dealer_id=&keyword=&list_price_max=&list_price_min=&makes[]=&maximum_distance=all&mileage_max=&page=1&page_size=100&sort=best_match_desc&stock_type=cpo&year_max=&year_min=&zip="
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html.parser')
cars = soup.find_all('div', class_='vehicle-card')
imag = soup.find_all('img')
name = []
mileage = []
dealer_name = []
rating = []
rating_count = []
price = []

for car in cars:
    #name
    name.append(car.find('h2').get_text())
    #mileage
    mileage.append(car.find('div', {'class':'mileage'}).get_text())
    #dealer_name
    dealer_name.append(car.find('div', {'class':'dealer-name'}).get_text())
    #rate
    rating.append(car.find('span', {'class':'sds-rating__count'}))
    #rate_count
    rating_count.append(car.find('span', {'class':'sds-rating__link'}).get_text())
    #price
    price.append(car.find('span', {'class':'primary-price'}).get_text())
    
for image in imag:
    #img
    img_url = image["src"]
    print(img_url)

In terms of error, am getting the following output:

/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
/images/placeholder_10x10.png
Traceback (most recent call last):
  File "c:\scraping\scrapeData.py", line 33, in <module>
    img_url = image["src"]
  File "C:\Users\PC\AppData\Local\Programs\Python\Python310\lib\site-packages\bs4\element.py", line 1486, in __getitem__        
    return self.attrs[key]
KeyError: 'src'

Anyone have idea what am missing out here?

Upvotes: 1

Views: 701

Answers (1)

Chris
Chris

Reputation: 16147

You need to debug your list. Obviously something in there doesn't have a src tag.

The following will try to assign the src to variable, and if it fails due to src not being valid, it will print the record.

for x in imag:
    try:
        d = x['src']
    except KeyError:
        print(x)

Output

<img alt="App Store download" class="app-store-button js-lazy-load" data-src="https://beta.cstatic-images.com/medium/in/v2/static/mobile-apps/app-store-badge-us-black-1.png"/>
<img alt="Google Play download" class="google-play-button js-lazy-load" data-src="https://beta.cstatic-images.com/medium/in/v2/static/mobile-apps/google-play-badge-us-1.png"/>

As you can see these are data-src not src.

This would parse them all. They are useful to you or not is another story:

for image in imag:
    try:
        img_url = image["src"]
    except KeyError:
        # Could just pass here if you don't want these
        img_url = image['data-src']
    print(img_url)

Upvotes: 1

Related Questions