Bebbi
Bebbi

Reputation: 1

Web scraping yahoo finance with selenium

It's driving me nuts but I really don't find the answer to my problem. I have done some coding with python and selenium to scrape the yahoo finance news from Daimler. But it simply doesn't work. I always get the following message in pycharm:

selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
  (Session info: chrome=85.0.4183.121)

But I am quite sure that the selector chosen is the only appropriate one to chose.. Here my coding:

from selenium import webdriver
import pandas as pd


url = 'https://finance.yahoo.com/quote/DAI.DE?p=DAI.DE&.tsrc=fin-srch'


driver = webdriver.Chrome('C:/Users/Startklar/Desktop/CFDS/chromedriver.exe')
driver.get(url)


driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")


articles = driver.find_elements_by_class_name('js-stream-content Pos(r)')


for article in articles:
    source = article.find_element_by_xpath('.//*[@id="quoteNewsStream-0-Stream"]/ul/li[3]/div/div/div[2]/div/span[1]').text
    title = article.find_element_by_xpath('//*[@id="quoteNewsStream-0-Stream"]/ul/li[5]/div/div/div[2]/h3/a').text
    text = article.find_element_by_xpath('//*[@id="quoteNewsStream-0-Stream"]/ul/li[5]/div/div/div[2]/p').text
    date = article.find_element_by_xpath('.//*[@id="quoteNewsStream-0-Stream"]/ul/li[5]/div/div/div[1]/div/span[2]').text

    print(source,title,text,date)

What's wrong. Really appreciate some help!

Thx a lot


Maybe it is useful to see the whole error message:

Traceback (most recent call last):
  File "C:/Users/Startklar/PycharmProjects/test/venv/Selenium Test.py", line 15, in <module>
    articles = driver.find_elements_by_css_selector('li.js-stream-content Pos(r)')
  File "C:\Users\Startklar\PycharmProjects\test\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 614, in find_elements_by_css_selector
    return self.find_elements(by=By.CSS_SELECTOR, value=css_selector)
  File "C:\Users\Startklar\PycharmProjects\test\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1007, in find_elements
    'value': value})['value'] or []
  File "C:\Users\Startklar\PycharmProjects\test\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "C:\Users\Startklar\PycharmProjects\test\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
  (Session info: chrome=85.0.4183.121)

that's the latest code by the way

from selenium import webdriver
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


url = 'https://finance.yahoo.com/quote/DAI.DE?p=DAI.DE&.tsrc=fin-srch'


driver = webdriver.Chrome('C:/Users/Startklar/Desktop/CFDS/chromedriver.exe')
driver.get(url)


driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")


articles = WebDriverWait(driver, 100).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.js-stream-content")))


for article in articles:
    try:
         source = article.find_element_by_xpath('//div/div/div[2]/div/span[1]').text
         title = article.find_element_by_xpath('//div/div/div[2]/h3/a').text
         text = article.find_element_by_xpath('//div/div/div[2]/p').text
         date = article.find_element_by_xpath('//div/div/div[2]/div/span[2]').text
         print(source,title,text,date+'/n')
    except:
        print("")

Upvotes: 0

Views: 550

Answers (1)

Arundeep Chohan
Arundeep Chohan

Reputation: 9969

Your xpaths and article selector was off.

articles = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.js-stream-content")))

for article in articles:
    try:
        source = article.find_element_by_xpath('//div/div/div[2]/div/span[1]').text
        title = article.find_element_by_xpath('//div/div/div[2]/h3/a').text
        text = article.find_element_by_xpath('//div/div/div[2]/p').text
        date = article.find_element_by_xpath('//div/div/div[1]/div/span[2]').text
        print(source,title,text,date+'/n')
    except:
        print("")

Import

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC

Upvotes: 1

Related Questions