whiskeyo
whiskeyo

Reputation: 914

Unexpected AttributeError happens even though it should not

I have been working on a script to get some data from the URL using Python and modules BeautifulSoup, and requests. I divided the code into few functions to add some code modularity and even though I expect the code to work well, it sometimes fails with AttributeError, to be more precise, this is the error I get:

Traceback (most recent call last):
  File "stats_tracker.py", line 140, in <module>
    print_interval(60)
  File "stats_tracker.py", line 108, in print_interval
    current_data = parse_table(get_data(URL))
  File "stats_tracker.py", line 30, in parse_table
    rows = table.find_all('tr')
AttributeError: 'NoneType' object has no attribute 'find_all'

The part of the code which produces the error is:

def get_data(URL):
    result = requests.get(URL)
    src = result.content
    soup = BeautifulSoup(src, 'html.parser')
    table = soup.find("table", {"class": "stats_table"})

    if table is not None:
        return table
    else:
        time.sleep(1)
        get_data(URL)

def parse_table(table):
    data = []
    rows = table.find_all('tr') # <----- 30th line from the traceback
    for row in rows:
        cols = row.find_all('td')
        cols = [ele.text.strip() for ele in cols]
        data.append([ele for ele in cols if ele])

    for row in data:
        row[0] = int(row[0].replace(',', ''))
        row.reverse()

    return data

If I understand the error correctly, it means that table, which should be calling find_all method, is None, thus it cannot call the method which does not exist for None. I am not sure how it is possible to happen that the argument passed to parse_table, which is the value returned from get_data(URL), is None.

How do I see that? After assigning the result of soup.find(...) to table I check whether it is not None, as I want to avoid the AttributeError. If it is not None, then I return table, which is later used as the argument of the parse_table - at this point table.find_all('tr') should work as expected. What if something went wrong and table is None? I call the sleep function and then I call the same function again, so the website can get fetched once more and maybe this time it will be correct.

I tested the code on two sources: the downloaded website as HTML and on the actual website. When it comes to the downloaded version, I could not get any errors to appear. When I run the script on the website, it works fine for about a day, fetching all information every 60 seconds, but then it crashes with the error I pasted above.

My question is: how can I improve the code to avoid the error I am getting over and over again (on multiple devices)? Is the if ... is not None not enough for that purpose, and if not - why? What can cause the code to fail after over one day of working, but not earlier?

Upvotes: 0

Views: 29

Answers (1)

Luke B
Luke B

Reputation: 2121

The problem is that if table is initially none, there will be nothing returned by the get_data function.

It will call itself again, but the return value for this second time is never used. What you should do is return the value from the second get_data.

if table is not None:
    return table
else:
    time.sleep(1)
    return get_data(URL) # return added here

Upvotes: 1

Related Questions