Reputation: 9019
I am using Python Selenium Webdriver to pull some information from the following site: http://www.ukathletics.com/schedule-list/#!/m-basebl/2016
I am interested in pulling some links, dates and team names. I have written the following code that identifies the correct information that I am looking for, however it only seems to grab the information up to a certain point and then instead appends empty items to my list (i.e. '').
I know that all of the lists should have 66 items if pulled correctly (Kentucky played 66 games). Any ideas why it stops pulling the information after the second LSU game?
bs = [] #boxscores
team2 = [] #opponents
dates = [] #dates of games
team1 = 'KENTUCKY' #team of interest
driver = webdriver.Chrome()
driver.get('http://www.ukathletics.com/schedule-list/#!/m-basebl/2016')
elem = driver.find_elements_by_class_name('event_link')
for i in elem:
bs.append(i.get_attribute('href'))
links = sorted(set(bs), key=lambda x: bs.index(x))
elem = driver.find_elements_by_class_name('school_name')
team2 = [i.text for i in elem if i.text!=team1]
elem = driver.find_elements_by_class_name('date')
for i in elem:
dates.append(i.text.replace(',','').replace('\n',' '))
print(links)
print(team2)
print(dates)
print(len(links))
print(len(team2))
print(len(dates))
MY RESULTS:
['http://www.ukathletics.com/game-center/580644ebe4b07dac0ca58a91/', 'http://www.ukathletics.com/game-center/5806455ce4b07dac0ca58a92/', 'http://www.ukathletics.com/game-center/58064594e4b09266491b651d/', 'http://www.ukathletics.com/game-center/5820d9dbe4b0493932cf30fd/', 'http://www.ukathletics.com/game-center/5820da33e4b0493932cf30fe/', 'http://www.ukathletics.com/game-center/5820da86e4b05e67c64470ca/', 'http://www.ukathletics.com/game-center/5820dabde4b0493932cf30ff/', 'http://www.ukathletics.com/game-center/5820daf4e4b05e67c64470cb/', 'http://www.ukathletics.com/game-center/5820db25e4b05e67c64470cc/', 'http://www.ukathletics.com/game-center/5820db6ce4b0493932cf3100/', 'http://www.ukathletics.com/game-center/5820db91e4b05e67c64470de/', 'http://www.ukathletics.com/game-center/5820dbb6e4b05e67c64470df/', 'http://www.ukathletics.com/game-center/5820dbe3e4b0493932cf3101/', 'http://www.ukathletics.com/game-center/5820dc0de4b05e67c64470e0/', 'http://www.ukathletics.com/game-center/58c1e98ee4b066e02ca82086/', 'http://www.ukathletics.com/game-center/5820dc32e4b05e67c64470e1/', 'http://www.ukathletics.com/game-center/5820dc80e4b0493932cf3102/', 'http://www.ukathletics.com/game-center/5820dcaae4b0493932cf3103/', 'http://www.ukathletics.com/game-center/5820dd1ee4b0493932cf3104/', 'http://www.ukathletics.com/game-center/5820dd6fe4b0493932cf3105/', 'http://www.ukathletics.com/game-center/5820dd8ce4b05e67c64470e3/', 'http://www.ukathletics.com/game-center/5820de21e4b05e67c64470e4/', 'http://www.ukathletics.com/game-center/5820de47e4b0493932cf3106/', 'http://www.ukathletics.com/game-center/5820de69e4b05e67c64470e5/', 'http://www.ukathletics.com/game-center/5820de87e4b0493932cf3107/', 'http://www.ukathletics.com/game-center/5820dea9e4b05e67c64470e6/', 'http://www.ukathletics.com/game-center/5820decee4b0493932cf3108/', 'http://www.ukathletics.com/game-center/5820deebe4b05e67c64470e7/', 'http://www.ukathletics.com/game-center/5820df0ce4b05e67c64470e8/', 'http://www.ukathletics.com/game-center/5820df50e4b0493932cf3114/', 'http://www.ukathletics.com/game-center/5820df85e4b05e67c64470e9/', 'http://www.ukathletics.com/game-center/5820dfa9e4b05e67c64470ea/', 'http://www.ukathletics.com/game-center/5820dfc7e4b05e67c64470eb/', 'http://www.ukathletics.com/game-center/5820dfebe4b0493932cf3115/', 'http://www.ukathletics.com/game-center/5820e023e4b0493932cf3116/', 'http://www.ukathletics.com/game-center/5820e03ee4b0493932cf3117/', 'http://www.ukathletics.com/game-center/5820e056e4b0493932cf3118/', 'http://www.ukathletics.com/game-center/5820e089e4b0493932cf3119/', 'http://www.ukathletics.com/game-center/5820e0bee4b05e67c64470ed/', 'http://www.ukathletics.com/game-center/5820e0a4e4b05e67c64470ec/']
['NORTH CAROLINA', 'NORTH CAROLINA', 'NORTH CAROLINA', 'LIBERTY', "ST. JOSEPH'S", 'OLD DOMINION', 'DELAWARE', 'E. KENTUCKY', 'WKU', 'UC SANTA BARBARA', 'UC SANTA BARBARA', 'UC SANTA BARBARA', 'WRIGHT STATE', 'CINCINNATI', 'MIAMI (OH)', 'MIAMI (OH)', 'MIAMI (OH)', 'MURRAY STATE', 'TEXAS A&M', 'TEXAS A&M', 'TEXAS A&M', 'WKU', 'OLE MISS', 'OLE MISS', 'OLE MISS', 'CINCINNATI', 'VANDERBILT', 'VANDERBILT', 'VANDERBILT', 'LOUISVILLE', 'MISSISSIPPI STATE', 'MISSISSIPPI STATE', 'MISSISSIPPI STATE', 'UT MARTIN', 'MIZZOU', 'MIZZOU', 'MIZZOU', 'LOUISVILLE', 'LSU', 'LSU', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
['FRI FEB 17', 'SAT FEB 18', 'SUN FEB 19', 'WED FEB 22', 'FRI FEB 24', 'SAT FEB 25', 'SUN FEB 26', 'TUE FEB 28', 'WED MAR 1', 'FRI MAR 3', 'SAT MAR 4', 'SUN MAR 5', 'TUE MAR 7', 'WED MAR 8', 'THU MAR 9', 'FRI MAR 10', 'SUN MAR 12', 'TUE MAR 14', 'FRI MAR 17', 'SAT MAR 18', 'SUN MAR 19', 'TUE MAR 21', 'THU MAR 23', 'FRI MAR 24', 'SAT MAR 25', 'TUE MAR 28', 'FRI MAR 31', 'SAT APR 1', 'SUN APR 2', 'TUE APR 4', 'FRI APR 7', 'SAT APR 8', 'SUN APR 9', 'WED APR 12', 'FRI APR 14', 'SAT APR 15', 'SUN APR 16', 'TUE APR 18', 'FRI APR 21', 'FRI APR 21', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
40
120
80
Upvotes: 0
Views: 53
Reputation: 1748
Actually all the elements are not fetched because they are not loaded. If you observe carefully the bottom elements of the table loaded only when scrolled down at the end of page.
You can try by adding below code after opening page in order to load complete table.
driver = webdriver.Chrome()
driver.get('http://www.ukathletics.com/schedule-list/#!/m-basebl/2016')
time.sleep(5)
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.END)
time.sleep(5)
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL +Keys.END)
I have tested it and gives below output:
66 #print(len(links))
198 #print(len(team2))
132 #print(len(dates))
Upvotes: 1