Anna Plym
Anna Plym

Reputation: 83

BeautifulSoup scraping img

I'm trying to scrape that website for the Captcha image link.

Using browser inspect element it's already appear but upon scraping it's not shown.

My target were to getting the img

Below is my code which i tried with it.

import requests
from bs4 import BeautifulSoup

with requests.Session() as s:
    url = "https://myurl.com/"
    r = s.get(url)
    soup = BeautifulSoup(r.content, "html.parser")
    for item in soup.findAll("img"):
        print(item)

Upvotes: 1

Views: 115

Answers (2)

Joseph Rajchwald
Joseph Rajchwald

Reputation: 487

Like others have said, selenium will help load the img allowing you to scrape it.

from selenium import webdriver
from bs4 import BeautifulSoup
import time

browser = webdriver.Firefox()
url = 'https://myurl.com/'
browser.get(url)
time.sleep(10) # wait 10 seconds for the captcha to load
html = browser.page_source
soup = BeautifulSoup(html,features='html.parser')

imgs = soup.find_all('img')
for img in imgs:
    print(img)

Returns:

<img alt="" id="yw1" src="/site/captcha/v/5dd3ccb47dd88/"/>

Upvotes: 0

KunduK
KunduK

Reputation: 33384

If you go to 'NetWork' tab you will get below link which returns the captcha image in JSON format. You don't need Selenium for that.

https://example.com/site/captcha/refresh/1/?_=1574163338269

You need to convert response into JSON and then get the url key val.

import requests

with requests.Session() as s:
    url = "https://example.com/site/captcha/refresh/1/?_=1574163338269"
    r = s.get(url, verify=False)
    img = r.json()
    print(img['url'])

NetworkTab

screenshot[1]

Upvotes: 1

Related Questions