Reputation: 27
So I've made a selenium bot which iterates through the list of territorial codes and send this codes to a search box into the website which changes the code into the city name which i then scrape in order to get a list of cities in place of list of codes. The problem is that when my for loop iterates through the list there are moments in which it "skips" the commands given and goes straight into the next iteration therefore I am not receiving a full list of cities. Some codes in the list are absent or unfit to pass into the website so I made exceptions for that situations.
import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")
# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()
cities = []
iteration = 0
for code in codes:
time.sleep(0.05)
iteration += 1
print(iteration)
if code == "Absence":
cities.append("Absence")
elif code == "Error":
cities.append("Error")
elif code == 2211041 or code == 2211021:
cities.append("Manual")
else:
# Send territorial code
driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').clear()
driver.find_element_by_xpath('//*[@id="body_TabContainer1_TabPanel1_TBJPTIdentyfikator"]').send_keys(code)
# Search
try:
button = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH,
'/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
button.click()
except:
button = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH,
'/html/body/form/section/div/div[2]/div[2]/div/div[2]/div/div[2]/div[1]/div[2]/div[1]/div/input')))
button.click()
# Scrape city name
city = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.XPATH, '//*[@id="body_TabContainer1_TabPanel1_GVTERC"]/tbody/tr[2]/td[1]/strong'))).text.split()
print(code)
print(city)
cities.append(city)
table = {
"Cities": cities
}
df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()
Here is a part of my console logs. As you can see, after indicating that the iteration number is 98 it skips to 99 where it works completely fine, printing the city and the territorial code. This problem occurs further into the loop but everytime it starts at iteration number 98. Territorial code related to this is not one of the exceptions.
96 <-- Iteration
2201025 <-- Territorial Code
['Kędzierzyn-Koźle', '(2201025)'] <-- City Name
97
2262011
['Bytów', '(2262011)']
98 !<-- Just iteration!
99
2205084
['Gdynia', '(2208011)']
**!Quick Note due to the answers! Here is the order of the print statements in the console. First: number of the iteration, Second: Territorial Code related to the iteration, Third: City Name**
Upvotes: 0
Views: 353
Reputation: 33361
There are several problems here:
I tried to make your code little bit better.
Please try it.
import time
import pandas
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
chrome_driver_path = "D:\Development\chromedriver.exe"
driver = webdriver.Chrome(chrome_driver_path)
driver.get("https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/wyszukiwanie/wyszukiwanie.aspx?contrast=default")
# Get the column with the codes from excel sheet and redo it into the list.
data = pandas.read_excel(r"D:\NFZ\FINAL Baza2WUJEK(poprawione1)-plusostatniepoprawki.xlsx")
codes = data["Kody terytorialne"].tolist()
code_input_xpath = 'body_TabContainer1_TabPanel1_TBJPTIdentyfikator'
search_button_xpath = '//input[@id="body_TabContainer1_TabPanel1_BJPTWyszukaj"]'
city_xpath = '//table[@id="body_TabContainer1_TabPanel1_GVTERC"]//td/strong'
cities = []
iteration = 0
for code in codes:
time.sleep(0.1)
iteration += 1
print(iteration)
if code == "Absence":
cities.append("Absence")
elif code == "Error":
cities.append("Error")
elif code == 2211041 or code == 2211021:
cities.append("Manual")
else:
# Send territorial code
driver.find_element_by_xpath(code_input_xpath).clear()
driver.find_element_by_xpath(code_input_xpath).send_keys(code)
# Search
button = WebDriverWait(driver, 20).until(
EC.visibility_of_element_located((By.XPATH,search_button_xpath)))
button.click()
# Scrape city name
time.sleep(2)
city = WebDriverWait(driver, 20).until(
EC.visibility_of_element_located((By.XPATH, city_xpath))).text.split()
print(code)
print(city)
cities.append(city)
table = {
"Cities": cities
}
df = pandas.DataFrame.from_dict(table)
df.to_excel("cities-FINAL.xlsx")
driver.close()
Upvotes: 1