Reputation: 83
I am trying to print out values from excel, and values are in numbers. I goal is to read these values and search in google one by one. Will stop for x seconds when the value is 'nan', then skip this 'nan' and then keep moving on to next.
Problems faced:
My solution:
Code:
import pandas as pd
import numpy as np
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.keys import Keys
from selenium.webdriver import ActionChains
import msvcrt
import datetime
import time
driver = webdriver.Chrome()
#Settings
pd.set_option('display.width',10, 'display.max_rows', 10, 'display.max_colwidth',100, 'display.width',10, 'display.float_format', lambda x: '%0.2f' % x)
df = pd.read_excel(r"BARCODE.xlsx", skiprows = 2, sheet_name = 'products id')
#Unnamed: 1 is also an empty column, i just didn't input UPC as title in excel.
upc = df['Unnamed: 1']
#I can't print out as interger...It will always have a xxxxx.0
print((upc[0:20]))
count = len(upc)
i = 0
for i in range(count ):
if upc[i] == 'nan':
'skip for x seconds and continue, i am not sure how to do yet'
else:
print(int(upc[i]))
driver.get('https://www.google.com')
driver.find_element_by_name('q').send_keys(int(upc[i]))
i = i + 1
Print out:
3337872411991.0
3433422408159.0
3337875598071.0
3337872412516.0
3337875518451.0
3337875613491.0
3337872413025.0
3337875398961.0
3337872410208.0
nan <- i want program to stop here so i can do something else.
3337872411991.0
3433422408159.0
3337875598071.0
3337872412516.0
3337875518451.0
3337875613491.0
3337872413025.0
3337875398961.0
3337872410208.0
nan
Name: Unnamed: 1, Length: 20, dtype: float64
3337872411991
3433422408159
3337875598071
3337872412516
3337875518451
etc....
Googled some format about number, such as set printing format, but I got confused between .format and lambda.
Upvotes: 0
Views: 250
Reputation: 51
It is printing out in scientific notation format
It seems you have numbers like UPC and EANs. You can probably solve that by marking numbers as text instead. If you need to have always length 13 you can correct it with appending zeroes at start.
Want to stop doing something when its
nan
in excel
Simplest solution could be to use input and accept any character to continue executing your code. But if you want to have few seconds time.sleep()
is good as well
Copy UPC[i] into google search, but i wanted to only copy once, due to i want to design it open new tab then copy the second UPC[i]
Some points you may want to reconsider:
enumerate()
if you need index values. If you do not need index you may simply drop it instead. for value in data_frame['UPC']:
Below you can check out working example (at least on my machine with python3
, w10
and chrome exe driver
).
import pandas as pd
from time import sleep
from selenium import webdriver
from selenium.webdriver import ActionChains
from selenium.webdriver.common.keys import Keys
# Settings
pd.set_option('display.width', 10, 'display.max_rows', 10, 'display.max_colwidth', 100, 'display.width', 10,
'display.float_format', lambda x: '%0.2f' % x)
data_frame = pd.read_excel('test.xlsx', sheet_name='products id', skip_blank_lines=False)
# I have chrome driver in exe, so this is how I need to inject it to get driver out
driver = webdriver.Chrome('chromedriver.exe')
google = 'https://www.google.com'
for index, value in enumerate(data_frame['UPC']): # named the column in excel file
if pd.isna(value):
print('{}: zzz'.format(index))
sleep(2) # will sleep for 2 seconds, use input() if you want to wait indefinitely instead
else:
print('{}: {} {}'.format(index, value, type(value)))
# since given values are float, you can convert it to int
value = int(value)
driver.get(google)
google_search = driver.find_element_by_name('q')
google_search.send_keys(value)
google_search.send_keys('\uE007') # this is "ENTER" for committing your search in google or Keys.ENTER
sleep(0.5)
# you may want to wait a bit before page loads fully, then scrape info you want
# also consider using try-except blocks if something unexpected happens
# if you want to open new tab (windows + chrome driver)
# open a link in a new window - workaround
helping_link = driver.find_element_by_link_text('Help')
actions = ActionChains(driver)
actions.key_down(Keys.CONTROL).click(helping_link).key_up(Keys.CONTROL).perform()
driver.switch_to.window(driver.window_handles[-1])
# close your instance of chrome driver or leave it if you need your tabs
# driver.close()
Upvotes: 1
Reputation: 305
if upc[i].isnull():
time.sleep(3)
check out this post, which boils down to:
driver.execute_script("window.open('https://www.google.com');") driver.switch_to.window(driver.window_handles[-1])
Upvotes: 1