Reputation: 1057
I am trying to figure out an elegant way to scrape tables from a website. However, when running below script I am getting a ValueError: No tables found
error.
from selenium import webdriver
import pandas as pd
driver = webdriver.Chrome(executable_path=r'C:...\chromedriver.exe')
driver.implicitly_wait(30)
driver.get("https://www.gallop.co.za/#meeting#20201125#3")
df_list=pd.read_html(driver.find_element_by_id("eventTab_4").get_attribute('outerHTML'))
When I look at the site elements, I notice that the code below works if the < table > tag lies neatly within the < div id="...">. However, in this case, I think the code is not working because of the following reasons:
Grateful for advice on how to pull the tables for all races. That is, there are several tables which are made visible as the user clicks on each tab (race). I need to extract all of them into separate dataframes.
Upvotes: 1
Views: 68
Reputation: 2227
from selenium import webdriver
import time
import pandas as pd
pd.set_option('display.max_column',None)
driver = webdriver.Chrome(executable_path='C:/bin/chromedriver.exe')
driver.get("https://www.gallop.co.za/#meeting#20201125#3")
time.sleep(5)
tab = driver.find_element_by_id('tabs') #All tabs are here
li_list = tab.find_elements_by_tag_name('li') #They are in a "li"
a_list = []
for li in li_list[1:]: #First tab has nothing..We skip it
a = li.find_element_by_tag_name('a') #extract the "a" element from the "li"
a_list.append(a)
df = []
for a in a_list:
a.click() #Next Tab
time.sleep(8) #Tables take some time to load fully
page = driver.page_source #Get the HTML of the new Tab page
source = pd.read_html(page)
table = source[1] #Get 2nd table
df.append(table)
print(df)
Output
[ Silk No Horse Unnamed: 3 ACS SH CFR Trainer \
0 NaN 1 JEM ROCK NaN 4bC A NaN Eric Sands
1 NaN 2 MAISON MERCI NaN 3chC A NaN Brett Crawford
2 NaN 3 WORDSWORTH NaN 3bC AB NaN Greg Ennion
3 NaN 4 FOUND THE DREAM NaN 3bG A NaN Adam Marcus
4 NaN 5 IZAPHA NaN 3bG A NaN Andre Nel
5 NaN 6 JACKBEQUICK NaN 3grG A NaN Glen Kotzen
6 NaN 7 MHLABENI NaN 3chG A NaN Eric Sands
7 NaN 8 ORLOV NaN 3bG A NaN Adam Marcus
8 NaN 9 T'CHALLA NaN 3bC A NaN Justin Snaith
9 NaN 10 WEST COAST WONDER NaN 3bG A NaN Piet Steyn
Jockey Wgt MR Dr Odds Last 3 Runs
0 D Dillon 60 0 7 NaN NaN
continued
Upvotes: 1