user707
user707

Reputation: 1

Solving TimeoutException with Python (Selenium)

For a small project I need to extract insolvency announcements from the following website: https://neu.insolvenzbekanntmachungen.de/ap/suche.jsf What I need after typing in todays date and selecting "Eröffnungen" from the dropdown menu of "Gegenstand der Veröffentlichung" and clicked on “search” at the bottom is the text that is hidden behind the zoom icon on the page afterwards.

The Python code I use to implement this is the following:

import...

driver = webdriver.Safari()
driver.maximize_window()
driver.get("https://neu.insolvenzbekanntmachungen.de/ap/suche.jsf")

date_input = WebDriverWait(driver, 10).until(
    ec.visibility_of_element_located((By.ID, "frm_suche:ldi_datumVon:datumHtml5"))
)
curr_date = datetime.today().strftime("%Y-%m-%d")
arg_value = f"arguments[0].value = '{curr_date}';"
driver.execute_script("arguments[0].value = '';", date_input)
driver.execute_script(arg_value, date_input)

# Select the "Eröffnungen" option from the dropdown menu
select_element = WebDriverWait(driver, 20).until(
    ec.visibility_of_element_located((By.ID, "frm_suche:lsom_gegenstand:lsom"))
)
driver.execute_script("arguments[0].scrollIntoView(true);", select_element)
time.sleep(1)
select = Select(select_element)
select.select_by_value("2")

# Execute the search
driver.execute_script("arguments[0].dispatchEvent(new Event('change'));", date_input)
date_input.send_keys(Keys.RETURN)

results_table = WebDriverWait(driver, 20).until(
    ec.visibility_of_element_located((By.ID, "tbl_ergebnis"))
)

rows = results_table.find_elements(By.TAG_NAME, "tr")
data = []

# Iterate over each row of the table
for i, row in enumerate(rows):
    cells = row.find_elements(By.TAG_NAME, "td")
    if len(cells) > 0:
        # Click the zoom icon to open the new window
        zoom_icon = cells[6].find_element(By.TAG_NAME, "input[type='image']")
        WebDriverWait(driver, 20).until(
            ec.element_to_be_clickable((By.TAG_NAME, "input[type='image']"))
        )
        driver.execute_script("arguments[0].scrollIntoView(true);", zoom_icon)
        time.sleep(1)  # Wait for scroll
        driver.execute_script("arguments[0].click();", zoom_icon)

        # Wait for the new window to open
        WebDriverWait(driver, 20).until(ec.new_window_is_opened)
        driver.switch_to.window(driver.window_handles[1])

        # Extract the publication text
        print("here")
        WebDriverWait(driver, 20).until(
            ec.presence_of_element_located((By.XPATH,
"//form[@id='form']//pre[@id='veroefftext']"))
        )
        print("here2")
        pub_text = driver.find_element(By.XPATH, "//form[@id='form']//pre[@id='veroefftext']").text

        # Close the new window and switch to the first window again
        driver.close()
        driver.switch_to.window(driver.window_handles[0])

        # Store the extracted data
        data.append({
            'veroeffentlichungsdatum': cells[0].text,
            'aktenzeichen': cells[1].text,
            'gericht': cells[2].text,
            'name_vorname_bezeichnung': cells[3].text,
            'sitz_wohnsitz': cells[4].text,
            'register': cells[5].text,
            'veroeffentlichungstext': pub_text
        })

df = pd.DataFrame(data)
df.columns = ['veroeffentlichungsdatum', 'aktuelles_aktenzeichen', 'gericht',
              'name_vorname_bezeichnung', 'sitz_wohnsitz', 'register', 'veroeffentlichungstext']
df.to_excel("data/insos.xlsx", index=False)

I'm relatively new to Selnium, but I think the code itself works well and does what it's supposed to. However, for the part between the two print statements (print(here), print(here2)) it keeps happening that the element cannot be localized and I get a TimeoutException. This seems to happen very randomly, as sometimes it works and sometimes it doesn't. I also already had a run where I was able to extract the text behind all zoom icons completely.

Could it be that the identifiers are not unique? I've already read related posts and most of the time it was suggesting using XPATH which I already implemented it. I already tried driver.switch_to.default_content() after the print(here) statement which was suggested in a related post but still the same problem.

Any help is much appreciated how I can make it stable as the script is supposed to be executed every day.

Upvotes: 0

Views: 54

Answers (2)

Farrukh Naveed Anjum
Farrukh Naveed Anjum

Reputation: 250

Try the following.

  • Try to increase the wait time
  • Use visibility of elements in your conditions
  • Do the proper Exception Handling. Always pay close attention to this. Use the try except
  • Make sure to close the window once your are done

Upvotes: 0

Slava Pasedko
Slava Pasedko

Reputation: 124

You explicitly wait for it to become located, you said it yourself that sometimes it works and sometimes it does not. You wait for 20 (ms or seconds, i don't know)

Based on what you described it looks like sometimes it becomes located fast enough and sometimes doesn't. This means you have to tune your waiting parameter and find an optimal solution when the element has enough time to locate itself.

Upvotes: 0

Related Questions