Jaspal Singh Rathour
Jaspal Singh Rathour

Reputation: 699

Scrape values from Website using Selenium

I am trying to extract data from the following website:

https://www.tipranks.com/stocks/sui/stock-analysis

I am targeting the value "6" in the octagon:

enter image description here

I believe I am targeting the correct xpath.

Here is my code:

import sys
import os
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium import webdriver

os.environ['MOZ_HEADLESS'] = '1'
binary = FirefoxBinary('C:/Program Files/Mozilla Firefox/firefox.exe', log_file=sys.stdout)

browser = webdriver.PhantomJS(service_args=["--load-images=no", '--disk-cache=true'])

url = 'https://www.tipranks.com/stocks/sui/stock-analysis'
xpath = '/html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/div[2]/section[1]/div[1]/div[1]/div/svg/text/tspan'
browser.get(url)

element = browser.find_element_by_xpath(xpath)

print(element)

Here is the error that I get back:

Traceback (most recent call last):
  File "C:/Users/jaspa/PycharmProjects/ig-markets-api-python-library/trader/market_signal_IV_test.py", line 15, in <module>
    element = browser.find_element_by_xpath(xpath)
  File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
    return self.find_element(by=By.XPATH, value=xpath)
  File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
    'value': value})['value']
  File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '/html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/div[2]/section[1]/div[1]/div[1]/div/svg/text/tspan'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Content-Length":"96","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:51786","User-Agent":"selenium/3.141.0 (python windows)"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"value\": \"/h3/div/span\", \"sessionId\": \"d8e91c70-9139-11e9-a9c9-21561f67b079\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/d8e91c70-9139-11e9-a9c9-21561f67b079/element"}}
Screenshot: available via screen

I can see that the issue is due to incorrect xpath, but can't figure out why.

I should also point out that using selenium has occurred to me as being the best method to scrape this site, and intend to extract other values and repeat these queries for different stocks on a number of pages. If anybody thinks I would be better with BeutifulSoup, lmxl etc then I am happy to hear suggestions!

Thanks in advance!

Upvotes: 2

Views: 1243

Answers (3)

MITHU
MITHU

Reputation: 154

You can try this css selector [class$='shape__Octagon'] to target the content. If I went for pyppeteer, I would do like the following:

import asyncio
from pyppeteer import launch

async def get_content(url):
    browser = await launch({"headless":True})
    [page] = await browser.pages()
    await page.goto(url)
    await page.waitForSelector("[class$='shape__Octagon']")
    value = await page.querySelectorEval("[class$='shape__Octagon']","e => e.innerText")
    return value

if __name__ == "__main__":
    url = "https://www.tipranks.com/stocks/sui/stock-analysis"
    loop = asyncio.get_event_loop()
    result = loop.run_until_complete(get_content(url))
    print(result.strip())

Output:

6

Upvotes: 2

Matt Blaha
Matt Blaha

Reputation: 977

You seem to have two issues here:

For the xpath, I just did:

xpath = '//div[@class="client-components-ValueChange-shape__Octagon"]'

And then do:

print(element.text)

And it gets the value you want. However, your code doesn't actually wait to do the xpath until the browser has finished loading the page. For me, using Firefox, I only get the value about 40% of the time this way. There are many ways to handle this with Selenium, the simplest is probably to just sleep for a few seconds between the browser.get and the xpath statement.

You seem to be setting up Firefox but then using Phantom. I did not try this with Phantom, the sleep behavior may be unnecessary with Phantom.

Upvotes: 1

Omer Tekbiyik
Omer Tekbiyik

Reputation: 4744

You dont even to declare all path . Octagonal is in the div which class client-components-ValueChange-shape__Octagon so search this div.

x = browser.find_elements_by_css_selector("div[class='client-components-ValueChange-shape__Octagon']") ## Declare which class
for all in x:
    print all.text

Output :

6

Upvotes: 2

Related Questions