Reputation: 1
It's driving me nuts but I really don't find the answer to my problem. I have done some coding with python and selenium to scrape the yahoo finance news from Daimler. But it simply doesn't work. I always get the following message in pycharm:
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
(Session info: chrome=85.0.4183.121)
But I am quite sure that the selector chosen is the only appropriate one to chose.. Here my coding:
from selenium import webdriver
import pandas as pd
url = 'https://finance.yahoo.com/quote/DAI.DE?p=DAI.DE&.tsrc=fin-srch'
driver = webdriver.Chrome('C:/Users/Startklar/Desktop/CFDS/chromedriver.exe')
driver.get(url)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
articles = driver.find_elements_by_class_name('js-stream-content Pos(r)')
for article in articles:
source = article.find_element_by_xpath('.//*[@id="quoteNewsStream-0-Stream"]/ul/li[3]/div/div/div[2]/div/span[1]').text
title = article.find_element_by_xpath('//*[@id="quoteNewsStream-0-Stream"]/ul/li[5]/div/div/div[2]/h3/a').text
text = article.find_element_by_xpath('//*[@id="quoteNewsStream-0-Stream"]/ul/li[5]/div/div/div[2]/p').text
date = article.find_element_by_xpath('.//*[@id="quoteNewsStream-0-Stream"]/ul/li[5]/div/div/div[1]/div/span[2]').text
print(source,title,text,date)
What's wrong. Really appreciate some help!
Thx a lot
Maybe it is useful to see the whole error message:
Traceback (most recent call last):
File "C:/Users/Startklar/PycharmProjects/test/venv/Selenium Test.py", line 15, in <module>
articles = driver.find_elements_by_css_selector('li.js-stream-content Pos(r)')
File "C:\Users\Startklar\PycharmProjects\test\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 614, in find_elements_by_css_selector
return self.find_elements(by=By.CSS_SELECTOR, value=css_selector)
File "C:\Users\Startklar\PycharmProjects\test\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 1007, in find_elements
'value': value})['value'] or []
File "C:\Users\Startklar\PycharmProjects\test\venv\lib\site-packages\selenium\webdriver\remote\webdriver.py", line 321, in execute
self.error_handler.check_response(response)
File "C:\Users\Startklar\PycharmProjects\test\venv\lib\site-packages\selenium\webdriver\remote\errorhandler.py", line 242, in check_response
raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
(Session info: chrome=85.0.4183.121)
that's the latest code by the way
from selenium import webdriver
import pandas as pd
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://finance.yahoo.com/quote/DAI.DE?p=DAI.DE&.tsrc=fin-srch'
driver = webdriver.Chrome('C:/Users/Startklar/Desktop/CFDS/chromedriver.exe')
driver.get(url)
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
articles = WebDriverWait(driver, 100).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.js-stream-content")))
for article in articles:
try:
source = article.find_element_by_xpath('//div/div/div[2]/div/span[1]').text
title = article.find_element_by_xpath('//div/div/div[2]/h3/a').text
text = article.find_element_by_xpath('//div/div/div[2]/p').text
date = article.find_element_by_xpath('//div/div/div[2]/div/span[2]').text
print(source,title,text,date+'/n')
except:
print("")
Upvotes: 0
Views: 550
Reputation: 9969
Your xpaths and article selector was off.
articles = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "li.js-stream-content")))
for article in articles:
try:
source = article.find_element_by_xpath('//div/div/div[2]/div/span[1]').text
title = article.find_element_by_xpath('//div/div/div[2]/h3/a').text
text = article.find_element_by_xpath('//div/div/div[2]/p').text
date = article.find_element_by_xpath('//div/div/div[1]/div/span[2]').text
print(source,title,text,date+'/n')
except:
print("")
Import
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
Upvotes: 1