Reputation: 13
I am new to web scraping and need to scrape some data from the website for my research: https://www.promedmail.org/.
What I coded was
However, on #5 I can't click the link even though I successfully obtained the <a>
tag using the article ID. The error message says:
selenium.common.exceptions.ElementNotInteractableException: Message: Element <a id="id6519943" class="lcl" href="javascript:;"> could not be scrolled into view
After some research, I figured that I would need to scroll to the link because the link was not visible. I tried 5 different solutions suggested in stackoverflow, but none of them really worked for me and I got stuck. They are listed in the below code and commented out.
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
class WebScraper:
"""Custome web scraper"""
def __init__(self, url, keyword):
self.url = url
self.keyword = keyword
self.search_results = []
self.ariticle_ids = []
def get_all_data(self):
"""Get beautiful soup objects for all articles"""
driver = webdriver.Firefox()
driver.get(self.url)
driver.find_element_by_id('search_tab').click()
driver.find_element_by_id('searchterm').send_keys(self.keyword)
driver.find_element_by_css_selector('#searchby_other > input[type=submit]').click()
element_article_id = driver.find_element_by_css_selector('#search_results > ul')
source_article_id = element_article_id.get_attribute('outerHTML')
soup_article_id = BeautifulSoup(source_article_id, 'html.parser')
tag_a = soup_article_id.select('ul > li > a[id]')
for i in range(len(tag_a)):
self.ariticle_ids.append(tag_a[i].get('id'))
element_link = driver.find_element_by_id(self.ariticle_ids[0])
# driver.execute_script("arguments[0].scrollIntoView();", element_link)
# driver.execute_script("window.scrollBy(0, -150);")
# element_link.location_once_scrolled_into_view
# ActionChains(driver).move_to_element(driver.find_element_by_id(self.ariticle_ids[0])).perform()
# WebDriverWait(driver, 1000000).until(EC.element_to_be_clickable((By.ID, self.ariticle_ids[0]))).click()
element_link.click()
if __name__ == "__main__":
url = 'https://www.promedmail.org/'
keyword = 'ebola'
webscraper = WebScraper(url, keyword)
webscraper.get_all_data()
When the link is clicked, a preview will pop up on the right panel. I am planning to scrape the article and move down to the next link.
Upvotes: 1
Views: 1343
Reputation: 14145
Quick Solution: You can click the link by using javascript as below.
driver.execute_script("arguments[0].click()",driver.find_element_by_id(ariticle_ids[0]))
Root Cause: Well, we found 2 elements matching with the id in the html. And the first one is under latest_alerts which is hidden when you are searching for the results. The second one is the one which is showing in the screen under search results. That's the reason why you are not able to scroll to the element, as find_element_by_id will get the first instance when there are multiple instances with matching id.
You can confirm this by using the below line of code.
print(len(driver.find_elements_by_id(self.ariticle_ids[0]))).
Solution: If you want to scroll to the element in the search results and then click on it, then you can use the below
element_link = driver.find_elements_by_id(self.ariticle_ids[0])[-1]
element_link.location_once_scrolled_into_view
element_link.click()
Upvotes: 1