Reputation: 103
I'm trying to use Python and Selenium to scrape multiple links on a web page. I'm using find_elements_by_xpath
and I'm able to locate a list of elements but I'm having trouble changing the list that is returned to the actual href
links. I know find_element_by_xpath
works, but that only works for one element.
Here is my code:
path_to_chromedriver = 'path to chromedriver location'
browser = webdriver.Chrome(executable_path = path_to_chromedriver)
browser.get("file:///path to html file")
all_trails = []
#finds all elements with the class 'text-truncate trail-name' then
#retrieve the a element
#this seems to be just giving us the element location but not the
#actual location
find_href = browser.find_elements_by_xpath('//div[@class="text truncate trail-name"]/a[1]')
all_trails.append(find_href)
print all_trails
This code is returning:
<selenium.webdriver.remote.webelement.WebElement
(session="dd178d79c66b747696c5d3750ea8cb17",
element="0.5700549730549636-1663")>,
<selenium.webdriver.remote.webelement.WebElement
(session="dd178d79c66b747696c5d3750ea8cb17",
element="0.5700549730549636-1664")>,
I expect the all_trails
array to be a list of links like: www.google.com, www.yahoo.com, www.bing.com
.
I've tried looping through the all_trails
list and running the get_attribute('href')
method on the list but I get the error:
Does anyone have any idea how to convert the selenium WebElement's to href links?
Any help would be greatly appreciated :)
Upvotes: 4
Views: 20101
Reputation: 1
get_attribute works on elements of that list only, not list itself. For eg :-
def fetch_img_urls(search_query: str):
driver.get('https://images.google.com/')
search = driver.find_element(By.CLASS_NAME, "gLFyf.gsfi")
search.send_keys(search_query)
search.send_keys(Keys.RETURN)
links=[]
try:
time.sleep(5)
urls = driver.find_elements(By.CSS_SELECTOR,'a.VFACy.kGQAp.sMi44c.lNHeqe.WGvvNb')
for url in urls:
#print(url.get_attribute("href"))
links.append(url.get_attribute("href"))
print(links)
except Exception as e:
print(f'error{e}')
driver.quit()
Upvotes: 0
Reputation: 41
Use it in Singular form as find_element_by_css_selector
instead of using find_elements_by_css_selector
as it returns many webElements in List. So you need to loop through each webElement to use Attribute.
Upvotes: 2
Reputation: 6459
If you have the following HTML:
<div class="text-truncate trail-name">
<a href="http://google.com">Link 1</a>
</div>
<div class="text-truncate trail-name">
<a href="http://google.com">Link 2</a>
</div>
<div class="text-truncate trail-name">
<a href="http://google.com">Link 3</a>
</div>
<div class="text-truncate trail-name">
<a href="http://google.com">Link 4</a>
</div>
Your code should look like:
all_trails = []
all_links = browser.find_elements_by_css_selector(".text-truncate.trail-name>a")
for link in all_links:
all_trails.append(link.get_attribute("href"))
Where all_trails -- is a list of links (Link 1, Link 2 and so on).
Hope it helps you!
Upvotes: 4
Reputation: 1759
find_href = browser.find_elements_by_xpath('//div[@class="text truncate trail-name"]/a[1]')
for i in find_href:
all_trails.append(i.get_attribute('href'))
get_attribute
works on elements of that list, not list itself.
Upvotes: 2
Reputation: 193378
Let us see what's happening in your code :
Without any visibility to the concerned HTML
it seems the following line returns two WebElements
in to the List
find_href
which are inturn are appended to the all_trails
List
:
find_href = browser.find_elements_by_xpath('//div[@class="text truncate trail-name"]/a[1]')
Hence when we print the List
all_trails
both the WebElements
are printed. Hence No Error.
As per the error snap shot you have provided, you are trying to invoke get_attribute("href")
method over a List
which is Not Supported. Hence you see the error :
'List' Object has no attribute 'get_attribute'
To get the href
attribute, we have to iterate over the List
as follows :
find_href = browser.find_elements_by_xpath('//your_xpath')
for my_href in find_href:
print(my_href.get_attribute("href"))
Upvotes: 11