bzh
bzh

Reputation: 27

Scraping Tables in Python Using Beautiful Soup AttributeError: 'NoneType'

I am working on scraping the two tables from the webpage: https://www.transfermarkt.com/premier-league/legionaereeinsaetze/wettbewerb/GB1/plus/?option=spiele&saison_id=2017&altersklasse=alle

I am trying to get many countries and years of data and have set up lists including country URLs.

Here is my code:

for l in range(0, len(league_urls)):
    time.sleep(0.5)
    #The second loop is for each year we want to scrape
    for n in range(2007,2020):
        time.sleep(0.5)
        df_soccer1 = None
        url = league_urls[l] + str(n) + str('&altersklasse=alle')
        headers = {"User-Agent":"Mozilla/5.0"}
        response = requests.get(url, headers=headers, verify=False)
        time.sleep(0.5)
        soup = BeautifulSoup(response.text, 'html.parser')

        #Table 1 with information about the value
        table = soup.find("table", {"class" : "items"})

        team = []
        players_used = []
        minutes_nonforeign = []
        minutes_foreign = []

        for row in table.find_all('tr')[1:]:
            try:
                col = row.find_all('td')
                team_ = col[1].text
                players_used_ = col[2].text
                minutes_nonforeign_ = col[3].text
                minutes_foreign_ = col[4].text
                team.append(team_)
                players_used.append(players_used_)
                minutes_nonforeign.append(minutes_nonforeign_)
                minutes_foreign.append(minutes_foreign_)
            except:
                team.append('')
                players_used.append('')
                minutes_nonforeign.append('')
                minutes_foreign.append('')

        team = [elem.replace('\n','').replace('\xa0','').strip() for elem in team]
        
 #Table 2 with information about placement, goals and points
        df_soccer2 = None

        table2 = soup.find("div", {"class" : "box tab-print"})

        team2 = []
        place = []
        matches = []
        difference = []
        pts = []

        for row in table2.find_all('tr'):
            try:
                col = row.findAll('td')
                team2_ = col[2].text
                place_  = col[0].text
                matches_ = col[3].text
                difference_ = col[4].text
                pts_ = col[5].text
                team2.append(team2_)
                place.append(place_)
                matches.append(matches_)
                difference.append(difference_)
                pts.append(pts_)
            except:
                team2.append('')
                place.append('')
                matches.append('')
                difference.append('')
                pts.append('')
               

        team2 = [elem.replace('\n','').replace('\xa0','').strip() for elem in team2]

        df_soccer1 = pd.DataFrame({'Team': team[1:], 'Season': [n]*(len(team)-1), 'Players used': players_used[1:], 
                                    'Minutes nonforeign': minutes_nonforeign[1:], 'Minutes foreign': minutes_foreign[1:]})
        
        df_soccer2 = pd.DataFrame({'Team': team2, 'Place': place, 'Matches': matches, 'Difference': difference,
                                     'Points': pts})

I am getting this issue when scraping the first table:

AttributeError                            Traceback (most recent call last)
<ipython-input-46-b4cd681f68e8> in <module>
     21         minutes_foreign = []
     22 
---> 23         for row in table.find_all("tr")[1:]:
     24             try:
     25                 col = row.find_all('td')

AttributeError: 'NoneType' object has no attribute 'find_all'

To note, league_urls is a long list of URLs.

I have used a similar code on another portion of the site and it works great. I just can't seem to figure out why it is not working on this one.

In addition, when I run the code using just a single URL, it works great. Is it possible there is some problem since I am looping across 12 years for 55 different URLs?

Upvotes: 0

Views: 196

Answers (1)

QHarr
QHarr

Reputation: 84465

Test if table is None e.g.

import requests
from bs4 import BeautifulSoup

url = 'https://www.transfermarkt.com/remier-liga/legionaereeinsaetze/wettbewerb/RU1/plus/?option=spiele&saison_id=2011&altersklasse=alle'
headers = {"User-Agent":"Mozilla/5.0"}
response = requests.get(url, headers=headers, verify=False)
#time.sleep(0.5)
soup = BeautifulSoup(response.text, 'html.parser')

#Table 1 with information about the value
table = soup.find("table", {"class" : "items"})

team = []
players_used = []
minutes_nonforeign = []
minutes_foreign = []

if not table is None:
    for row in table.find_all('tr')[1:]:
            try:
                col = row.find_all('td')
                team_ = col[1].text
                players_used_ = col[2].text
                minutes_nonforeign_ = col[3].text
                minutes_foreign_ = col[4].text
                team.append(team_)
                players_used.append(players_used_)
                minutes_nonforeign.append(minutes_nonforeign_)
                minutes_foreign.append(minutes_foreign_)
            except:
                team.append('')
                players_used.append('')
                minutes_nonforeign.append('')
                minutes_foreign.append('')
else:
    team.append('')
    players_used.append('')
    minutes_nonforeign.append('')
    minutes_foreign.append('')

Upvotes: 1

Related Questions