Reputation: 319
Using selenium I am trying to scrape a table from a website,
however, data appears compressed into one single column rather than two separate columns; Date
and Value
- help would be greatly appreciated - now amended to included further improvement
driver.get("https://www.multpl.com/shiller-pe/table/by-year/")
table_trs = driver.find_elements(By.XPATH, '//table[@id="datatable"]/tbody/tr')
value_list = []
for v in range(len(table_trs )):
value_list.append(table_trs [v].text)
print(value_list)
df = pd.DataFrame(value_list)
The table I am trying to scrape appears on the website as follows....
..and the section of html associated to it.....
Upvotes: 0
Views: 4385
Reputation: 25048
Note: Answer is focused on correct usage of xpath and only based on your screenshot - Improving your question and posting code and examples as text would generate more specific answers
To get the <tr>
s of the table by xpath
change it to //table[@id="datatable"]/tbody/tr
from selenium.webdriver.common.by import By
table_trs = driver.find_elements(By.XPATH, '//table[@id="datatable"]/tbody/tr')
Based on your improvements you can go with pandas.read_html()
to convert the table directly into a dataframe. Just have to rename the columns, cause there are two span tag in the <th>
that will lead to column name "Value Value":
driver.get("https://www.multpl.com/shiller-pe/table/by-year/")
df = pd.read_html(driver.page_source)[0]
df.columns = ['Date','Value']
As alternative you could iterate like this:
driver.get("https://www.multpl.com/shiller-pe/table/by-year/")
table_trs = driver.find_elements(By.XPATH, '//table[@id="datatable"]/tbody/tr')
value_list = []
for row in table_trs[1:]:
value_list.append({
'Date':row.find_elements(By.TAG_NAME, "td")[0].text,
'Value':row.find_elements(By.TAG_NAME, "td")[1].text
})
df = pd.DataFrame(value_list)
Date | Value |
---|---|
Feb 4, 2022 | 37.18 |
Jan 1, 2022 | 39.63 |
Jan 1, 2021 | 34.51 |
Jan 1, 2020 | 30.99 |
Jan 1, 2019 | 28.38 |
Jan 1, 2018 | 33.31 |
Jan 1, 2017 | 28.06 |
Jan 1, 2016 | 24.21 |
Jan 1, 2015 | 26.49 |
... | ... |
Upvotes: 3
Reputation: 2183
Something is missing here or you put wrong xpath.
Valid xpath will be (based on picture) "//div[@id="datatable"]/tbody/tr"
, but that will only give you the rows. You can iterate by getting the all row nubers and colums, something like //div[@id="datatable"]/tbody/tr[i]/td[j]
and then get text fro each element.
Upvotes: 1