Reputation: 77
I'm trying to retrieve the fear index from the link http://money.cnn.com/data/fear-and-greed/. The index is dynamically changing. When I inspect the element, it shows the coding below. I'm just wondering how to use python Selenium to get the 84 and other indexes? I tried to use the code below but only got blank. Any ideas?
cr = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH,"//*[contains(text(), 'Fear & Greed Now')]")))
Below is the webpage code
<div id="needleChart" style="background-image:url('http://money.cnn.com/.element/img/5.0/data/feargreed/1.png');">
<ul>
<li>Fear & Greed Now: 84 (Extreme Greed)
</li>
<li>Fear & Greed Previous Close: 86 (Extreme Greed)</li>
<li>Fear & Greed 1 Week Ago: 89 (Extreme Greed)</li>
<li>Fear & Greed 1 Month Ago: 57 (Greed)</li>
<li>Fear & Greed 1 Year Ago: 16 (Extreme Fear)</li>
</ul>
Upvotes: 3
Views: 2569
Reputation: 23825
Try as below :-
elements = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID,"needleChart"))).find_elements_by_tag_name("li")
for li in elements:
text = li.get_attribute("innerHTML")
s = ''.join(x for x in text if x.isdigit())
print(s)
Hope it helps...:)
Upvotes: 0
Reputation: 474161
According to the specification, .text
would only give you the rendered text by default, which, I suspect, is becoming empty because of the weird styling of the "needleChart" parent container.
You need to use innerHTML
instead of .text
to workaround the "empty text" problem:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://money.cnn.com/data/fear-and-greed/")
driver.maximize_window()
wait = WebDriverWait(driver, 10)
list_indexes = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#needleChart")))
indexes = list_indexes.find_elements_by_tag_name("li")
for index in indexes:
print(index.get_attribute("innerHTML"))
driver.close()
Prints:
Fear & Greed Now: 86 (Extreme Greed)
Fear & Greed Previous Close: 86 (Extreme Greed)
Fear & Greed 1 Week Ago: 89 (Extreme Greed)
Fear & Greed 1 Month Ago: 57 (Greed)
Fear & Greed 1 Year Ago: 16 (Extreme Fear)
You can then post-process these texts and make a nice result dictionary, extracting the period as a key and the index as a value:
import re
pattern = re.compile(r"^Fear & Greed (.*?): (\d+)")
d = dict(pattern.search(index.get_attribute("innerHTML")).groups() for index in indexes)
print(d)
Prints:
{
u'Previous Close': u'86',
u'Now': u'86',
u'1 Year Ago': u'16',
u'1 Week Ago': u'89',
u'1 Month Ago': u'57'
}
Upvotes: 2
Reputation: 2149
You can find it by finding the element and extract its innerHTML text:
element = webdriver.find_element_by_xpath("//div[@id='needleChart']/ul/li")
text = element.get_attribute("innerHTML")
text will contain all text as following:
Fear & Greed Now: 86 (Extreme Greed)
then you can use regex to extract the greed index from this string above.
Upvotes: 1