Reputation: 1
I am new to Python and I am scraping a website for links and then extracting data from those links. I need your help with two issues. There are over 2500 links and adding the urls to a list works fine. However, the script stops usually after 200-300 extractions, because of a element-not-clickable error. I have put the EC wait for element and time sleep, but those still don't help. How can I tell the script, to retry after such an error and continue with the rest of the links?
The second problem is that pandas keeps adding index numbers while I have set false as show_index and Index=false.
Any help is much appreciated.
urls = []
num = 91
while True:
main_url = 'url' + str(num) + '%7D'
driver.get(main_url)
driver.maximize_window()
time.sleep(3)
list_links =
driver.find_elements_by_css_selector('div.feedItemMessage a')
for link in list_links:
url = link.get_attribute('href')
if 'details' in url:
urls.append(url)
num += 1
if num > 92:
break
print('number of links to extract ' + str(len(urls)))
for id_url in urls:
driver.get(id_url)
time.sleep(2)
WebDriverWait(driver, 120).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="wrapper"]/div[8]/div/div/div/div[1]/div/div[2]/ul/div/div[5]/a'))).click()
switch_tab = driver.switch_to.window(driver.window_handles[1])
url_id = (driver.current_url)
id = str(url_id)
file_name = driver.find_element_by_xpath('//*[@id="HeaderSingNumberD"]').text
rit = driver.page_source
soup = BeautifulSoup(rit, 'html5lib')
tables = soup.find_all('table')
table_rows = soup.find_all('tr')
cells = soup.find_all('td')
df = pd.read_html(str(tables))
df_rows = pd.read_html(str(table_rows))
df_cells = pd.read_html(str(cells))
dfall = pd.DataFrame(df)
# dfallnoindex = dfall.style.hide_index()
dfallspecs = dfall[4:14]
try:
dfallspecs.to_excel(file_name + '.xls', encoding="hebrew", index=False, index_label=None, header=False)
except UnicodeEncodeError:
pass
close_tab = driver.close()
switch_tab = driver.switch_to.window(driver.window_handles[0])
Upvotes: 0
Views: 147
Reputation: 6354
You need to add try/except/continue
around the block that raises the error
E.g:
while True:
...some code...
try:
line_that_raises_TypeError()
except TypeError:
continue # when this gets hit, execution will continue with the next iteration of the loop
...some more code...
The same principle applies to for-loops of course
Note that continue
can also be used without an exception context.
for item in items:
...some code...
if item == "something to skip":
continue
...some more code...
Upvotes: 1