nrcjea001
nrcjea001

Reputation: 1057

Reading tables with Pandas and Selenium

I am trying to figure out an elegant way to scrape tables from a website. However, when running below script I am getting a ValueError: No tables found error.

from selenium import webdriver
import pandas as pd

driver = webdriver.Chrome(executable_path=r'C:...\chromedriver.exe')
driver.implicitly_wait(30)

driver.get("https://www.gallop.co.za/#meeting#20201125#3")

df_list=pd.read_html(driver.find_element_by_id("eventTab_4").get_attribute('outerHTML'))

When I look at the site elements, I notice that the code below works if the < table > tag lies neatly within the < div id="...">. However, in this case, I think the code is not working because of the following reasons:

  1. There is a < div > within a < div > and then there is the < table > tag.
  2. The site uses Javascript with the tables.

Grateful for advice on how to pull the tables for all races. That is, there are several tables which are made visible as the user clicks on each tab (race). I need to extract all of them into separate dataframes.

Upvotes: 1

Views: 68

Answers (1)

Abhishek Rai
Abhishek Rai

Reputation: 2227

from selenium import webdriver
import time
import pandas as pd

pd.set_option('display.max_column',None)
driver = webdriver.Chrome(executable_path='C:/bin/chromedriver.exe')
driver.get("https://www.gallop.co.za/#meeting#20201125#3")
time.sleep(5)
tab = driver.find_element_by_id('tabs') #All tabs are here
li_list = tab.find_elements_by_tag_name('li')  #They are in a "li"
a_list = []
for li in li_list[1:]:  #First tab has nothing..We skip it
    a = li.find_element_by_tag_name('a')  #extract the "a" element from the "li"
    a_list.append(a)

df = []
for a in a_list:
    a.click()    #Next Tab
    time.sleep(8)  #Tables take some time to load fully
    page = driver.page_source   #Get the HTML of the new Tab page
    source = pd.read_html(page)
    table = source[1]    #Get 2nd table
    df.append(table)

print(df)

Output

 [   Silk  No              Horse  Unnamed: 3   ACS  SH  CFR         Trainer  \
0   NaN   1           JEM ROCK         NaN   4bC   A  NaN      Eric Sands   
1   NaN   2       MAISON MERCI         NaN  3chC   A  NaN  Brett Crawford   
2   NaN   3         WORDSWORTH         NaN   3bC  AB  NaN     Greg Ennion   
3   NaN   4    FOUND THE DREAM         NaN   3bG   A  NaN     Adam Marcus   
4   NaN   5             IZAPHA         NaN   3bG   A  NaN       Andre Nel   
5   NaN   6        JACKBEQUICK         NaN  3grG   A  NaN     Glen Kotzen   
6   NaN   7           MHLABENI         NaN  3chG   A  NaN      Eric Sands   
7   NaN   8              ORLOV         NaN   3bG   A  NaN     Adam Marcus   
8   NaN   9           T'CHALLA         NaN   3bC   A  NaN   Justin Snaith   
9   NaN  10  WEST COAST WONDER         NaN   3bG   A  NaN      Piet Steyn   

                 Jockey  Wgt  MR  Dr  Odds       Last 3 Runs  
0              D Dillon   60   0   7   NaN               NaN  

continued

Upvotes: 1

Related Questions