Reputation: 699
I am trying to extract data from the following website:
I am targeting the value "6" in the octagon:
I believe I am targeting the correct xpath.
Here is my code:
import sys
import os
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
from selenium import webdriver
os.environ['MOZ_HEADLESS'] = '1'
binary = FirefoxBinary('C:/Program Files/Mozilla Firefox/firefox.exe', log_file=sys.stdout)
browser = webdriver.PhantomJS(service_args=["--load-images=no", '--disk-cache=true'])
url = 'https://www.tipranks.com/stocks/sui/stock-analysis'
xpath = '/html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/div[2]/section[1]/div[1]/div[1]/div/svg/text/tspan'
browser.get(url)
element = browser.find_element_by_xpath(xpath)
print(element)
Here is the error that I get back:
Traceback (most recent call last):
File "C:/Users/jaspa/PycharmProjects/ig-markets-api-python-library/trader/market_signal_IV_test.py", line 15, in <module>
element = browser.find_element_by_xpath(xpath)
File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 394, in find_element_by_xpath
return self.find_element(by=By.XPATH, value=xpath)
File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 978, in find_element
'value': value})['value']
File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\jaspa\AppData\Local\Programs\Python\Python36-32\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with xpath '/html/body/div[1]/div/div/div/div/main/div/div/article/div[2]/div/main/div[1]/div[2]/section[1]/div[1]/div[1]/div/svg/text/tspan'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Content-Length":"96","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:51786","User-Agent":"selenium/3.141.0 (python windows)"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"xpath\", \"value\": \"/h3/div/span\", \"sessionId\": \"d8e91c70-9139-11e9-a9c9-21561f67b079\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"","host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/d8e91c70-9139-11e9-a9c9-21561f67b079/element"}}
Screenshot: available via screen
I can see that the issue is due to incorrect xpath, but can't figure out why.
I should also point out that using selenium has occurred to me as being the best method to scrape this site, and intend to extract other values and repeat these queries for different stocks on a number of pages. If anybody thinks I would be better with BeutifulSoup, lmxl etc then I am happy to hear suggestions!
Thanks in advance!
Upvotes: 2
Views: 1243
Reputation: 154
You can try this css selector [class$='shape__Octagon']
to target the content. If I went for pyppeteer, I would do like the following:
import asyncio
from pyppeteer import launch
async def get_content(url):
browser = await launch({"headless":True})
[page] = await browser.pages()
await page.goto(url)
await page.waitForSelector("[class$='shape__Octagon']")
value = await page.querySelectorEval("[class$='shape__Octagon']","e => e.innerText")
return value
if __name__ == "__main__":
url = "https://www.tipranks.com/stocks/sui/stock-analysis"
loop = asyncio.get_event_loop()
result = loop.run_until_complete(get_content(url))
print(result.strip())
Output:
6
Upvotes: 2
Reputation: 977
You seem to have two issues here:
For the xpath, I just did:
xpath = '//div[@class="client-components-ValueChange-shape__Octagon"]'
And then do:
print(element.text)
And it gets the value you want. However, your code doesn't actually wait to do the xpath until the browser has finished loading the page. For me, using Firefox, I only get the value about 40% of the time this way. There are many ways to handle this with Selenium, the simplest is probably to just sleep for a few seconds between the browser.get and the xpath statement.
You seem to be setting up Firefox but then using Phantom. I did not try this with Phantom, the sleep behavior may be unnecessary with Phantom.
Upvotes: 1
Reputation: 4744
You dont even to declare all path . Octagonal is in the div which class client-components-ValueChange-shape__Octagon
so search this div.
x = browser.find_elements_by_css_selector("div[class='client-components-ValueChange-shape__Octagon']") ## Declare which class
for all in x:
print all.text
Output :
6
Upvotes: 2