West
West

Reputation: 2570

I'm getting AttributeError: 'NoneType' object has no attribute 'find'

Hi this is a my code to retrieve the first topic in the ycombinator website. When I run the code I get-

AttributeError: 'NoneType' object has no attribute 'find'for the line 
level2= data.level1.find('table',attrs = {'id':'hnmain'})

The topics are nested deep within various tags which is why I proceeded as below. I'm just doing this for practice so I know this might not be the best way to code as its my first day and I just want to know how to get past the error.

import requests
from bs4 import BeautifulSoup
response1= requests.get('https://news.ycombinator.com/')
response = response1.text

data = BeautifulSoup(response,"html.parser")

level1= data.body.find('centre')
level2= data.level1.find('table',attrs = {'id':'hnmain'})
level3= data.level2.find('tbody')
level4= data.level3.find('tr')
level5= data.level4.find('td')
level6= data.level5.find('table.itemlist')
level7= data.level6.find('tbody')
level8= data.level7.find('tr#15426209.athing')
level9= data.level8.find('td.title')
level10= data.level9.find('a.storylink')
print(level10.text)

Upvotes: 3

Views: 7651

Answers (2)

0TTT0
0TTT0

Reputation: 1322

The source of the error is..

From the Beautiful Soup Documentation -- If find() can’t find anything, it returns None.

It definitely can't find the centre tag that you want as the spelling is incorrect..

Also you are going to want to remove data from the lower levels as the first level is already returning a Tag Element object.. level2= level1.find('table',attrs = {'id':'hnmain'})

I am still getting caught up after level 3.. i grepped the return value and tbody didn't show up anywhere.. so I am not sure where the tree actually swerved off.

Upvotes: 0

Evan Nowak
Evan Nowak

Reputation: 895

I think you're getting the error because of the data.body portion. I've never seen it done that way tbh.

Here's a modified version of your code that works:

import requests
from bs4 import BeautifulSoup

r = requests.get('https://news.ycombinator.com')

soup = BeautifulSoup(r.text, 'lxml')

# print soup.prettify()

stories = []

for a in soup.find_all('a', attrs={'class': 'storylink'}):
    stories.append([a.text, a['href']])

print stories[0]

[u'Using Binary Diffing to Discover Windows Kernel Memory Disclosure Bugs', 'https://googleprojectzero.blogspot.com/2017/10/using-binary-diffing-to-discover.html']

I've commented out soup.prettify(), but you can uncomment it and see what it does - it shows you the source code of the page in a nicely organized way.

Upvotes: 3

Related Questions