Reputation: 27
I am working on scraping the two tables from the webpage: https://www.transfermarkt.com/premier-league/legionaereeinsaetze/wettbewerb/GB1/plus/?option=spiele&saison_id=2017&altersklasse=alle
I am trying to get many countries and years of data and have set up lists including country URLs.
Here is my code:
for l in range(0, len(league_urls)):
time.sleep(0.5)
#The second loop is for each year we want to scrape
for n in range(2007,2020):
time.sleep(0.5)
df_soccer1 = None
url = league_urls[l] + str(n) + str('&altersklasse=alle')
headers = {"User-Agent":"Mozilla/5.0"}
response = requests.get(url, headers=headers, verify=False)
time.sleep(0.5)
soup = BeautifulSoup(response.text, 'html.parser')
#Table 1 with information about the value
table = soup.find("table", {"class" : "items"})
team = []
players_used = []
minutes_nonforeign = []
minutes_foreign = []
for row in table.find_all('tr')[1:]:
try:
col = row.find_all('td')
team_ = col[1].text
players_used_ = col[2].text
minutes_nonforeign_ = col[3].text
minutes_foreign_ = col[4].text
team.append(team_)
players_used.append(players_used_)
minutes_nonforeign.append(minutes_nonforeign_)
minutes_foreign.append(minutes_foreign_)
except:
team.append('')
players_used.append('')
minutes_nonforeign.append('')
minutes_foreign.append('')
team = [elem.replace('\n','').replace('\xa0','').strip() for elem in team]
#Table 2 with information about placement, goals and points
df_soccer2 = None
table2 = soup.find("div", {"class" : "box tab-print"})
team2 = []
place = []
matches = []
difference = []
pts = []
for row in table2.find_all('tr'):
try:
col = row.findAll('td')
team2_ = col[2].text
place_ = col[0].text
matches_ = col[3].text
difference_ = col[4].text
pts_ = col[5].text
team2.append(team2_)
place.append(place_)
matches.append(matches_)
difference.append(difference_)
pts.append(pts_)
except:
team2.append('')
place.append('')
matches.append('')
difference.append('')
pts.append('')
team2 = [elem.replace('\n','').replace('\xa0','').strip() for elem in team2]
df_soccer1 = pd.DataFrame({'Team': team[1:], 'Season': [n]*(len(team)-1), 'Players used': players_used[1:],
'Minutes nonforeign': minutes_nonforeign[1:], 'Minutes foreign': minutes_foreign[1:]})
df_soccer2 = pd.DataFrame({'Team': team2, 'Place': place, 'Matches': matches, 'Difference': difference,
'Points': pts})
I am getting this issue when scraping the first table:
AttributeError Traceback (most recent call last)
<ipython-input-46-b4cd681f68e8> in <module>
21 minutes_foreign = []
22
---> 23 for row in table.find_all("tr")[1:]:
24 try:
25 col = row.find_all('td')
AttributeError: 'NoneType' object has no attribute 'find_all'
To note, league_urls is a long list of URLs.
I have used a similar code on another portion of the site and it works great. I just can't seem to figure out why it is not working on this one.
In addition, when I run the code using just a single URL, it works great. Is it possible there is some problem since I am looping across 12 years for 55 different URLs?
Upvotes: 0
Views: 196
Reputation: 84465
Test if table is None e.g.
import requests
from bs4 import BeautifulSoup
url = 'https://www.transfermarkt.com/remier-liga/legionaereeinsaetze/wettbewerb/RU1/plus/?option=spiele&saison_id=2011&altersklasse=alle'
headers = {"User-Agent":"Mozilla/5.0"}
response = requests.get(url, headers=headers, verify=False)
#time.sleep(0.5)
soup = BeautifulSoup(response.text, 'html.parser')
#Table 1 with information about the value
table = soup.find("table", {"class" : "items"})
team = []
players_used = []
minutes_nonforeign = []
minutes_foreign = []
if not table is None:
for row in table.find_all('tr')[1:]:
try:
col = row.find_all('td')
team_ = col[1].text
players_used_ = col[2].text
minutes_nonforeign_ = col[3].text
minutes_foreign_ = col[4].text
team.append(team_)
players_used.append(players_used_)
minutes_nonforeign.append(minutes_nonforeign_)
minutes_foreign.append(minutes_foreign_)
except:
team.append('')
players_used.append('')
minutes_nonforeign.append('')
minutes_foreign.append('')
else:
team.append('')
players_used.append('')
minutes_nonforeign.append('')
minutes_foreign.append('')
Upvotes: 1