Reputation: 9
I want to crawl and download the image from a website, but I don't know why I receive an error when running this code.
import requests
from bs4 import BeautifulSoup
from urllib.request import urlretrieve
url = 'https://www.thsrc.com.tw/tw/TimeTable/SearchResult'
response = requests.get(url)
response.encoding = 'utf-8'
soup = BeautifulSoup(response.text, 'html.parser')
all_imgs = soup.find_all('img')
for index, img in enumerate(all_imgs):
if index!=0:
print(img['src'])
image_path = 'https://www.thsrc.com.tw'+img['src']
image_name = img['src'].split('/')[-1]
print('image path is {}, file name is {}'.format(image_path, image_name))
urlretrieve(image_path, 'save_image/'+image_name)
And this is what I received:
Upvotes: 0
Views: 342
Reputation: 25073
For some reasons there is a whitespace in the img['src']
, so you have to strip()
it:
image_path = 'https://www.thsrc.com.tw'+img['src'].strip()
import requests
from bs4 import BeautifulSoup
from urllib.request import urlretrieve
url = 'https://www.thsrc.com.tw/tw/TimeTable/SearchResult'
response = requests.get(url)
response.encoding = 'utf-8'
soup = BeautifulSoup(response.text)
for img in soup.find_all('img'):
print(img['src'])
image_path = 'https://www.thsrc.com.tw'+img['src'].strip()
image_name = img['src'].split('/')[-1]
print('image path is {}, file name is {}'.format(image_path, image_name))
urlretrieve(image_path, 'save_image/'+image_name)
Upvotes: 1