Reputation: 11
This train from a book, Python Web Scraping With Python by Ryan Mitchell,in Chinese version p23.and I find any others are similar.who can tell me how fix it? thank you in advance. I has posted picture. code as follow:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bsObj = BeautifulSoup(html,"html.parser")
images = bsObj.findALL("img",{"src":re.compile("\.\.\/img\/gifts\/img.*\.jpg")})
for image in images:
print(image["src"])
Upvotes: 0
Views: 488
Reputation: 180391
It is actually the *findALL*
it should be lowercase l's i.e findAll
or better to use *find_all*
as findAll is deprecated.:
images = bsObj.find_all("img",{"src":re.compile("\.\./img/gifts/img.*\.jpg")})
Which will give you:
../img/gifts/img1.jpg
../img/gifts/img2.jpg
../img/gifts/img3.jpg
../img/gifts/img4.jpg
../img/gifts/img6.jpg
Unless there were other images with ../img/gifts/img
in their path you could use a css selector in place of the regex to find images that had
/img/gifts/img in their src attribute.
images = bsObj.select("img[src*=../img/gifts/img]")
Upvotes: 1