teapot
teapot

Reputation: 77

How to use Selenium to get this index?

I'm trying to retrieve the fear index from the link http://money.cnn.com/data/fear-and-greed/. The index is dynamically changing. When I inspect the element, it shows the coding below. I'm just wondering how to use python Selenium to get the 84 and other indexes? I tried to use the code below but only got blank. Any ideas?

cr = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH,"//*[contains(text(), 'Fear & Greed Now')]")))

Below is the webpage code

<div id="needleChart" style="background-image:url('http://money.cnn.com/.element/img/5.0/data/feargreed/1.png');">
<ul>
<li>Fear &amp; Greed Now: 84 (Extreme Greed)
</li>
<li>Fear &amp; Greed Previous Close: 86 (Extreme Greed)</li>
<li>Fear &amp; Greed 1 Week Ago: 89 (Extreme Greed)</li>
<li>Fear &amp; Greed 1 Month Ago: 57 (Greed)</li>
<li>Fear &amp; Greed 1 Year Ago: 16 (Extreme Fear)</li>
</ul>

Upvotes: 3

Views: 2569

Answers (3)

Saurabh Gaur
Saurabh Gaur

Reputation: 23825

Try as below :-

elements = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.ID,"needleChart"))).find_elements_by_tag_name("li")

for li in elements:
  text = li.get_attribute("innerHTML")
  s = ''.join(x for x in text if x.isdigit())
  print(s)

Hope it helps...:)

Upvotes: 0

alecxe
alecxe

Reputation: 474161

According to the specification, .text would only give you the rendered text by default, which, I suspect, is becoming empty because of the weird styling of the "needleChart" parent container.

You need to use innerHTML instead of .text to workaround the "empty text" problem:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get("http://money.cnn.com/data/fear-and-greed/")
driver.maximize_window()

wait = WebDriverWait(driver, 10)
list_indexes = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "#needleChart")))

indexes = list_indexes.find_elements_by_tag_name("li")
for index in indexes:
    print(index.get_attribute("innerHTML"))

driver.close()

Prints:

Fear &amp; Greed Now: 86 (Extreme Greed)
Fear &amp; Greed Previous Close: 86 (Extreme Greed)
Fear &amp; Greed 1 Week Ago: 89 (Extreme Greed)
Fear &amp; Greed 1 Month Ago: 57 (Greed)
Fear &amp; Greed 1 Year Ago: 16 (Extreme Fear)

You can then post-process these texts and make a nice result dictionary, extracting the period as a key and the index as a value:

import re

pattern = re.compile(r"^Fear &amp; Greed (.*?): (\d+)")
d = dict(pattern.search(index.get_attribute("innerHTML")).groups() for index in indexes)
print(d)

Prints:

{
    u'Previous Close': u'86', 
    u'Now': u'86', 
    u'1 Year Ago': u'16', 
    u'1 Week Ago': u'89', 
    u'1 Month Ago': u'57'
}

Upvotes: 2

Yu Zhang
Yu Zhang

Reputation: 2149

You can find it by finding the element and extract its innerHTML text:

element = webdriver.find_element_by_xpath("//div[@id='needleChart']/ul/li")
text = element.get_attribute("innerHTML")

text will contain all text as following:

Fear & Greed Now: 86 (Extreme Greed)

then you can use regex to extract the greed index from this string above.

Upvotes: 1

Related Questions