Reputation: 914
I have been working on a script to get some data from the URL using Python and modules BeautifulSoup
, and requests
. I divided the code into few functions to add some code modularity and even though I expect the code to work well, it sometimes fails with AttributeError
, to be more precise, this is the error I get:
Traceback (most recent call last):
File "stats_tracker.py", line 140, in <module>
print_interval(60)
File "stats_tracker.py", line 108, in print_interval
current_data = parse_table(get_data(URL))
File "stats_tracker.py", line 30, in parse_table
rows = table.find_all('tr')
AttributeError: 'NoneType' object has no attribute 'find_all'
The part of the code which produces the error is:
def get_data(URL):
result = requests.get(URL)
src = result.content
soup = BeautifulSoup(src, 'html.parser')
table = soup.find("table", {"class": "stats_table"})
if table is not None:
return table
else:
time.sleep(1)
get_data(URL)
def parse_table(table):
data = []
rows = table.find_all('tr') # <----- 30th line from the traceback
for row in rows:
cols = row.find_all('td')
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols if ele])
for row in data:
row[0] = int(row[0].replace(',', ''))
row.reverse()
return data
If I understand the error correctly, it means that table
, which should be calling find_all
method, is None
, thus it cannot call the method which does not exist for None
. I am not sure how it is possible to happen that the argument passed to parse_table
, which is the value returned from get_data(URL)
, is None
.
How do I see that? After assigning the result of soup.find(...)
to table
I check whether it is not None
, as I want to avoid the AttributeError
. If it is not None
, then I return table
, which is later used as the argument of the parse_table
- at this point table.find_all('tr')
should work as expected. What if something went wrong and table
is None
? I call the sleep
function and then I call the same function again, so the website can get fetched once more and maybe this time it will be correct.
I tested the code on two sources: the downloaded website as HTML and on the actual website. When it comes to the downloaded version, I could not get any errors to appear. When I run the script on the website, it works fine for about a day, fetching all information every 60 seconds, but then it crashes with the error I pasted above.
My question is: how can I improve the code to avoid the error I am getting over and over again (on multiple devices)? Is the if ... is not None
not enough for that purpose, and if not - why? What can cause the code to fail after over one day of working, but not earlier?
Upvotes: 0
Views: 29
Reputation: 2121
The problem is that if table
is initially none
, there will be nothing returned by the get_data
function.
It will call itself again, but the return value for this second time is never used. What you should do is return the value from the second get_data
.
if table is not None:
return table
else:
time.sleep(1)
return get_data(URL) # return added here
Upvotes: 1