Selenium scraped data to pandas dataframe

Question

It is my first attempt for scraping with selenium.

I collected what I want but I want to pass it to pandas dataframe in order to make some calculations.

below sample code is how I get the data;

(it is a financial data and [2] and [3] represents years(2016,2017) respectively)

nf1 = driver.find_element_by_xpath('//*[@id="tbodyMTablo"]/tr[84]/td[2]').text
nf2 = driver.find_element_by_xpath('//*[@id="tbodyMTablo"]/tr[84]/td[3]').text

do_v1 = driver.find_element_by_xpath('//*[@id="tbodyMTablo"]/tr[2]/td[2]').text
do_v2 = driver.find_element_by_xpath('//*[@id="tbodyMTablo"]/tr[2]/td[3]').text

kvb_1 = driver.find_element_by_xpath('//*[@id="tbodyMTablo"]/tr[29]/td[2]').text
kvb_2 = driver.find_element_by_xpath('//*[@id="tbodyMTablo"]/tr[29]/td[3]').text

It is a numerical data but stored as str(probably because of .text) and int(nf2) or float(nf2) didn't work.

Is there any way to store as values in first place? ( without .text it returns 0)

What is the proper way to scrape numerical data and store it in dataframe?

Thanks in advance.

Peter Bejan · Accepted Answer

try using .get_attribute('innerHTML') instead of .text

edit*

It seems that you are trying to convert selenium object into int(). but int requires a string to convert(that contains only numbers).

So, you can try to convert it like this.

"this example is about scraping a number inside of a field on a random page on Wikipedia; try to adapt it to your code."

from selenium import webdriver

driver = webdriver.Chrome()

driver.get('https://it.wikipedia.org/wiki/Internet#Nascita_del_World_Wide_Web_.281991.29')

scraped = driver.find_element_by_xpath('//span[@class="tocnumber" and contains(text(), "1")]')

print(int(scraped.get_attribute('innerHTML')))

driver.quit()

Selenium scraped data to pandas dataframe

Answers (1)

Related Questions