Reputation: 37
Using the code bellow I wanted to extract gold price by using xpath and then use liner regression to do basic predictions.
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from sklearn.linear_model import LinearRegression
import time
import numpy as np
from sklearn.svm import SVR
import pytz
from datetime import datetime
from sys import argv
import os, psutil
################################################
if len(argv) != 5:
print (argv[0] + '<train count> <timeout(s)> <predict date(Y/M/D)> <predict clock(H:M:S)>')
sys.exit(2)
X_predict = [(int(datetime.strptime(argv[3] + " " + argv[4], '%Y/%m/%d %H:%M:%S').timestamp()*(10000000)))]
################################################
X=[]
y=[]
#driver = webdriver.Chrome()
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://goldprice.org/live-gold-price.html')
elem_xpath = '//[@id="gpxtickerLeft_price"]'
for i in range(1, int(argv[1])):
try:
elem = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, elem_xpath)))
print ("train => ", i)
X.append(int(time.time()*(10000000)))
y.append(int(elem.text.replace(',', '')))
time.sleep(int(argv[2]))
finally:
driver.quit
##############################################
X = np.array(X).reshape(-1, 1)
y = np.array(y).reshape(-1, 1)
X_predict = np.array(X_predict).reshape(-1, 1)
##############################################
svr_rbf = LinearRegression()
y_rbf = svr_rbf.fit(X,y).predict(X_predict)
##########################################
#print ('X:'.format(X))
#print ('y:'.format(y))
#print ('X_predict:{}'.format(X_predict))
##########################################
print ('y_rbf: {}'.format(int(y_rbf)))
print('memory usage: {} MB'.format(
int(psutil.Process(os.getpid()).memory_info().rss/1024/1024)
))
But after executing the script I get the following error:
C:\Users\Lev\Desktop>python mls.py 6 3 2020/12/11 12:43:06
[WDM] - ====== WebDriver manager ======
[WDM] - Current google-chrome version is 90.0.4430
[WDM] - Get LATEST driver version for 90.0.4430
[WDM] - Driver [C:\Users\Lev\.wdm\drivers\chromedriver\win32\90.0.4430.24\chrome
driver.exe] found in cache
DevTools listening on ws://127.0.0.1:6275/devtools/browser/10d6bc25-3034-4ca7-a4
37-c0cf39c86274
[4412:5028:0514/123522.805:ERROR:device_event_log_impl.cc(214)] [12:35:22.805] F
IDO: webauthn_api.cc:54 Windows WebAuthn API failed to load
[5524:4228:0514/123532.459:ERROR:ssl_client_socket_impl.cc(947)] handshake faile
d; returned -1, SSL error code 1, net_error -100
[5524:4228:0514/123533.786:ERROR:ssl_client_socket_impl.cc(947)] handshake faile
d; returned -1, SSL error code 1, net_error -100
[5524:4228:0514/123538.624:ERROR:ssl_client_socket_impl.cc(947)] handshake faile
d; returned -1, SSL error code 1, net_error -100
[5524:4228:0514/123538.825:ERROR:ssl_client_socket_impl.cc(947)] handshake faile
d; returned -1, SSL error code 1, net_error -100
Traceback (most recent call last):
File "mls.py", line 32, in <module>
elem = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.
XPATH, elem_xpath)))
File "D:\Python38\lib\site-packages\selenium\webdriver\support\wait.py", line
80, in until
raise TimeoutException(message, screen, stacktrace)
selenium.common.exceptions.TimeoutException: Message:
The line "[5524:4228:0514/123533.786:ERROR:ssl_client_socket_impl.cc(947)] handshake faile d; returned -1, SSL error code 1, net_error -100" is just keeps getting spammed.
I guess the Xpatch is wrong.
Upvotes: 1
Views: 410
Reputation: 4870
XPath should be
//span[@id="gpxtickerLeft_price"]
You used:
//[@id="gpxtickerLeft_price"]
The part with the [] is called the predicate. See this page for some example
It needs a node or attribute to filter on. // is not a node.
Node examples:
//div
//*
//text()
//@id
Upvotes: 1
Reputation: 156
I think the matter is with your browser driver version also. In the logs, I can see that you have google chrome version:: 90.0.4430, but the chromedriver version is old.
Please try removing this chromedriver.exe version by going to your command prompt and running the command :: taskkill /F /IM chromedriver. exe.
Then install new chromedriver.exe from here (depending upon your machine).
Use it in your code.
Upvotes: 1
Reputation: 29382
this id gpxtickerLeft_price
represents three webelement (not all but 2 have prefix). you have two options now :
Use find_elements
Write 3 different locators for web elements.
Code :
elem = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.ID, 'gpxtickerLeft_price')))
read more why xpath is less prefer over ID here
Upvotes: 1
Reputation: 33381
Yes, your xpath
is missing a tag name.
So it should be //span[@id="gpxtickerLeft_price"]
or //*[@id="gpxtickerLeft_price"]
Upvotes: 1