Tinny
Tinny

Reputation: 9

How to crawl pictures via python beautiful soup

I want to crawl and download the image from a website, but I don't know why I receive an error when running this code.

import requests
from bs4 import BeautifulSoup
from urllib.request import urlretrieve


url = 'https://www.thsrc.com.tw/tw/TimeTable/SearchResult'
response = requests.get(url)
response.encoding = 'utf-8'

soup = BeautifulSoup(response.text, 'html.parser')
all_imgs = soup.find_all('img')

for index, img in enumerate(all_imgs):
    if index!=0:
        print(img['src'])
        image_path = 'https://www.thsrc.com.tw'+img['src']
        image_name = img['src'].split('/')[-1]
        print('image path is {}, file name is {}'.format(image_path, image_name))
        urlretrieve(image_path, 'save_image/'+image_name)

And this is what I received:

1

Upvotes: 0

Views: 342

Answers (1)

HedgeHog
HedgeHog

Reputation: 25073

For some reasons there is a whitespace in the img['src'], so you have to strip() it:

image_path = 'https://www.thsrc.com.tw'+img['src'].strip()
Example
import requests
from bs4 import BeautifulSoup
from urllib.request import urlretrieve

url = 'https://www.thsrc.com.tw/tw/TimeTable/SearchResult'
response = requests.get(url)
response.encoding = 'utf-8'

soup = BeautifulSoup(response.text)

for img in soup.find_all('img'):
    print(img['src'])
    image_path = 'https://www.thsrc.com.tw'+img['src'].strip()
    image_name = img['src'].split('/')[-1]
    print('image path is {}, file name is {}'.format(image_path, image_name))
    urlretrieve(image_path, 'save_image/'+image_name)

Upvotes: 1

Related Questions