Reputation: 113
Im using beautifulsoup to find and download images from a given website, however the website contains images which aren't in the usual <img src="icon.gif"/>
format:
The ones that are causing me problems for example are like this :
<form action="example.jpg">
<!-- <img src="big.jpg" /> -->
background-image:url("xine.png");
My code to find the images is:
webpage = "https://example.com/images/"
soup = BeautifulSoup(urlopen(webpage), "html.parser")
for img in soup.find_all('img'):
img_url = urljoin(webpage, img['src'])
file_name = img['src'].split('/')[-1]
file_path = os.path.join("C:\\users\\images", file_name)
urlretrieve(img_url, file_path)
I think i might have to use a regex but hopefully i don't have to.
Thanks in advance
Upvotes: 1
Views: 2629
Reputation: 1307
Modify the path you pass to urlretrieve
to specify exactly where you want the file to be copied to:
file_path = os.path.join('c:\files\cw\downloads', file_name)
urlretrieve(img_url, file_path)
Edit:
It looks like you are also trying to find img
tags inside comments. Building off of Find specific comments in HTML code using python:
...
imgs = soup.find_all('img')
comments = soup.findAll(text=lambda text:isinstance(text, bs4.Comment))
for comment in comments:
comment_soup = bs4.BeautifulSoup(comment)
imgs.extend(comment_soup.findAll('img'))
for img in imgs:
...
Upvotes: 1