wli2019
wli2019

Reputation: 21

How to extract "text" from function in a JavaScript through Selenium?

Update: The script provided by Jonas has solved most of the problems. Now, I am trying to find a way to use datepicker or sendkey to set the date range since it will automatically take one day every time when I re-run the code.

date_start = driver.find_element(By.Xpath, 'date_from')
date_end = driver.find_element(By.Xpath, 'date_to')
date_start.sendKeys("2021-09-24")
date_end.sendKeys("2021-10-01")

Original Problem: I am using Selenium WebDriver.Chrome to extract data from a table that can not be highlighted for copy and paste from the website and I found out that the data are under JavaScript's function when I tried to extract the data with BeautifulSoup. The HTML code for the Java table is like this:

<script>

  function initTableData() {
    window.initialAnalystData = [{"action_company":"Initiates Coverage On","action_pt":"Announces","analyst":"BTIG","analyst_name":"James Sullivan","currency":"USD","lastTradePrice":24.89},"logo":null}];
    window.initialAnalystDate = {"date_from":"2021-09-24","date_to":"2021-10-01"};

          window.initialAnalystTime = "11:27";
      }

  initTableData();

</script>

I am new to both Selenium and JavaScript, but I have tried the following code to get the data list and it is not working.

element = driver.findElement(By.tagName("script"));
htmlCode = driver.executeScript("return arguments[0].innerHTML;", element)

What should I try next? The website link is here.

Thanks!

Upvotes: 2

Views: 726

Answers (1)

Jonas
Jonas

Reputation: 1769

You could use regular expression to find the part and then work with it:

from selenium import webdriver
import time
import re

url = 'https://www.benzinga.com/analyst-ratings'
driver.get(url)
time.sleep(5) #Let it load all the data first

htmlSource = driver.page_source
raw_data = re.findall(r'window.initialAnalystData = .*;', htmlSource)[0][29:].split('{')[1:]


#clean data if you want (just one possible way out of many!):

cleaned_data = {}
for data in raw_data:
    clean_data = data.split(',')
    details_to_dic = {}
    for details in clean_data:
        details_temp = details.replace('"', '')
        details_temp = details_temp.split(':')
        try:
            details_to_dic[details_temp[0]] = details_temp[1]
        except:
            pass

    cleaned_data[details_to_dic['name']] = details_to_dic

So you have the data as a dictionary (example data of company APA):

print(cleaned_data['APA'])

output:

{'action_company': 'Downgrades', 'action_pt': 'Lowers', 'analyst': 'Citigroup', 'analyst_name': 'Scott Gruber', 'currency': 'USD', 'date': '2021-10-01', 'exchange': 'NASDAQ', 'id': '61573ba273a5f300019bb64a', 'importance': '0', 'name': 'APA', 'notes': '', 'pt_current': '23.0000', 'pt_prior': '27.0000', 'rating_current': 'Neutral', 'rating_prior': 'Buy', 'ticker': 'APA', 'time': '12', 'updated': '1633106928', 'url': 'https', 'url_calendar': 'https', 'url_news': 'https', 'quote': ''}

Upvotes: 1

Related Questions