Problem with image source scraping in loop using Python BS4

Question

When I try to scrape image source, all I get is a thing starting with "data:type" and base64 encoding of the image. All I want to get is URL of the image.

I tried to check i I can just condition it and skip this and then extract the URL, but it doesn't work, it just skips entire image.

Please help.

def get_product_data(url):
    response = http.request("GET", url)
    check_conn_errors(response.status)
    bs_data = Bs(response.data, "html.parser")
    product_html = bs_data.find("div", {"class": PRODUCT_DATA_CLASS_NAME})
    imgs = product_html.find_all("img")
    img_link = ""

    for i in range(len(imgs)):
        if imgs[i]["src"].startswith("/"):
            img_link = PRODUCTS_URL_PREFIX + imgs[i]["src"]
            break
        elif imgs[i]["src"].startswith("http") \
                or imgs[i]["src"].startswith("www") \
                or imgs[i]["src"].startswith(DEALER_NAME.split(' ')[0].lower()):
            img_link = imgs[i]["src"]
            break

    if img_link == "":
        print("this doesn't work")

#TODO: standardize description scraping

    desc_html = product_html.find_all("div", {"class": DESCRIPTION_CLASS_NAME})
    desc = ""

    for desc_part in desc_html:
        desc += desc_part.text.replace('
', '').replace('
', '').replace('
', '').replace('
', '')
    return [desc, img_link]

Problem with image source scraping in loop using Python BS4

Answers (1)

Related Questions