Reputation: 61
I have problem with find href values in BeautifulSoup`
from urllib import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("https://www.google.pl/search?q=sprz%C4%99t+dla+graczy&client=ubuntu&ei=4ypXWsi_BcLZwQKGroW4Bg&start=0&sa=N&biw=741&bih=624")
bsObj = BeautifulSoup(html)
for link in bsObj.find("h3", {"class":"r"}).findAll("a"):
if 'href' in link.attrs:
print(link.attrs['href'])
all the time I have error:
"AttributeError: 'NoneType' object has no attribute 'findAll'
Upvotes: 1
Views: 326
Reputation: 15376
You'll have to change the User-Agent string to something other than urllib's default user agent.
from urllib2 import urlopen, Request
from bs4 import BeautifulSoup
url = "https://www.google.pl/search?q=sprz%C4%99t+dla+graczy&client=ubuntu&ei=4ypXWsi_BcLZwQKGroW4Bg&start=0&sa=N&biw=741&bih=624"
html = urlopen(Request(url, headers={'User-Agent':'Mozilla/5'})).read()
bsObj = BeautifulSoup(html, 'html.parser')
for link in bsObj.find("h3", {"class":"r"}).findAll("a", href=True):
print(link['href'])
Also note that this expression will select only the first link. If you want to select all the links in the page use the following expression:
links = bsObj.select("h3.r a[href]")
for link in links:
print(link['href'])
Upvotes: 4