Reputation: 21
Update: The script provided by Jonas has solved most of the problems. Now, I am trying to find a way to use datepicker or sendkey to set the date range since it will automatically take one day every time when I re-run the code.
date_start = driver.find_element(By.Xpath, 'date_from')
date_end = driver.find_element(By.Xpath, 'date_to')
date_start.sendKeys("2021-09-24")
date_end.sendKeys("2021-10-01")
Original Problem: I am using Selenium WebDriver.Chrome to extract data from a table that can not be highlighted for copy and paste from the website and I found out that the data are under JavaScript's function when I tried to extract the data with BeautifulSoup. The HTML code for the Java table is like this:
<script>
function initTableData() {
window.initialAnalystData = [{"action_company":"Initiates Coverage On","action_pt":"Announces","analyst":"BTIG","analyst_name":"James Sullivan","currency":"USD","lastTradePrice":24.89},"logo":null}];
window.initialAnalystDate = {"date_from":"2021-09-24","date_to":"2021-10-01"};
window.initialAnalystTime = "11:27";
}
initTableData();
</script>
I am new to both Selenium and JavaScript, but I have tried the following code to get the data list and it is not working.
element = driver.findElement(By.tagName("script"));
htmlCode = driver.executeScript("return arguments[0].innerHTML;", element)
What should I try next? The website link is here.
Thanks!
Upvotes: 2
Views: 726
Reputation: 1769
You could use regular expression to find the part and then work with it:
from selenium import webdriver
import time
import re
url = 'https://www.benzinga.com/analyst-ratings'
driver.get(url)
time.sleep(5) #Let it load all the data first
htmlSource = driver.page_source
raw_data = re.findall(r'window.initialAnalystData = .*;', htmlSource)[0][29:].split('{')[1:]
#clean data if you want (just one possible way out of many!):
cleaned_data = {}
for data in raw_data:
clean_data = data.split(',')
details_to_dic = {}
for details in clean_data:
details_temp = details.replace('"', '')
details_temp = details_temp.split(':')
try:
details_to_dic[details_temp[0]] = details_temp[1]
except:
pass
cleaned_data[details_to_dic['name']] = details_to_dic
So you have the data as a dictionary (example data of company APA):
print(cleaned_data['APA'])
output:
{'action_company': 'Downgrades', 'action_pt': 'Lowers', 'analyst': 'Citigroup', 'analyst_name': 'Scott Gruber', 'currency': 'USD', 'date': '2021-10-01', 'exchange': 'NASDAQ', 'id': '61573ba273a5f300019bb64a', 'importance': '0', 'name': 'APA', 'notes': '', 'pt_current': '23.0000', 'pt_prior': '27.0000', 'rating_current': 'Neutral', 'rating_prior': 'Buy', 'ticker': 'APA', 'time': '12', 'updated': '1633106928', 'url': 'https', 'url_calendar': 'https', 'url_news': 'https', 'quote': ''}
Upvotes: 1