Reputation: 111
I am getting a KeyError: 'title' error in my web scraping program and not sure what the issue is. When I use inspect element on the webpage I can see the element that I am trying to find;
import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
url = 'https://www.ncaagamesim.com/college-basketball-predictions.asp'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
table = soup.find('table')
# Get column names
headers = table.find_all('th')
cols = [x.text for x in headers]
# Get all rows in table body
table_rows = table.find_all('tr')
rows = []
# Grab the text of each td, and put into a rows list
for each in table_rows[1:]:
odd_avail = True
data = each.find_all('td')
time = data[0].text.strip()
try:
matchup, odds = data[1].text.strip().split('\xa0')
odd_margin = float(odds.split('by')[-1].strip())
except:
matchup = data[1].text.strip()
odd_margin = '-'
odd_avail = False
odd_team_win = data[1].find_all('img')[-1]['title']
sim_team_win = data[2].find('img')['title']
sim_margin = float(re.findall("\d+\.\d+", data[2].text)[-1])
if odd_avail == True:
if odd_team_win == sim_team_win:
diff = sim_margin - odd_margin
else:
diff = -1 * odd_margin - sim_margin
else:
diff = '-'
row = {cols[0]: time, 'Matchup': matchup, 'Odds Winner': odd_team_win, 'Odds': odd_margin,
'Simulation Winner': sim_team_win, 'Simulation Margin': sim_margin, 'Diff': diff}
rows.append(row)
df = pd.DataFrame(rows)
print (df.to_string())
# df.to_csv('odds.csv', index=False)
I am getting the error on setting the sim_team_win line. It is getting data[2] which is the 3rd column on the website and finding the img title to get the team name. Is it because the img title is within another div? Also, when running this code it also does not print out the "Odds" column, which is being stored in the odd_margin variable. Is there something that is wrong when setting that variable? Thanks in advance for the help!
Upvotes: 0
Views: 652
Reputation: 1432
As far as the not finding the img title, if you look at the row with New Mexico @ Dixie State, there is no image in the third column - no img title in the source either.
For the Odds column, after try/excepting the sim_team_win assignment, I get all the Odds values in the table.
Upvotes: 1