Reputation: 2570
Hi this is a my code to retrieve the first topic in the ycombinator
website. When I run the code I get-
AttributeError: 'NoneType' object has no attribute 'find'for the line
level2= data.level1.find('table',attrs = {'id':'hnmain'})
The topics are nested deep within various tags which is why I proceeded as below. I'm just doing this for practice so I know this might not be the best way to code as its my first day and I just want to know how to get past the error.
import requests
from bs4 import BeautifulSoup
response1= requests.get('https://news.ycombinator.com/')
response = response1.text
data = BeautifulSoup(response,"html.parser")
level1= data.body.find('centre')
level2= data.level1.find('table',attrs = {'id':'hnmain'})
level3= data.level2.find('tbody')
level4= data.level3.find('tr')
level5= data.level4.find('td')
level6= data.level5.find('table.itemlist')
level7= data.level6.find('tbody')
level8= data.level7.find('tr#15426209.athing')
level9= data.level8.find('td.title')
level10= data.level9.find('a.storylink')
print(level10.text)
Upvotes: 3
Views: 7651
Reputation: 1322
The source of the error is..
From the Beautiful Soup Documentation --
If find()
can’t find anything, it returns None
.
It definitely can't find the centre
tag that you want as the spelling is incorrect..
Also you are going to want to remove data
from the lower levels as the first level is already returning a Tag Element object.. level2= level1.find('table',attrs = {'id':'hnmain'})
I am still getting caught up after level 3.. i grepped the return value and tbody
didn't show up anywhere.. so I am not sure where the tree actually
swerved off.
Upvotes: 0
Reputation: 895
I think you're getting the error because of the data.body
portion. I've never seen it done that way tbh.
Here's a modified version of your code that works:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://news.ycombinator.com')
soup = BeautifulSoup(r.text, 'lxml')
# print soup.prettify()
stories = []
for a in soup.find_all('a', attrs={'class': 'storylink'}):
stories.append([a.text, a['href']])
print stories[0]
[u'Using Binary Diffing to Discover Windows Kernel Memory Disclosure Bugs', 'https://googleprojectzero.blogspot.com/2017/10/using-binary-diffing-to-discover.html']
I've commented out soup.prettify()
, but you can uncomment it and see what it does - it shows you the source code of the page in a nicely organized way.
Upvotes: 3