Reputation: 55
I have been searching across the site in the hope of finding an answer, however, every question I view doesn't have heavily nested HTML code like the page I am trying to scrape. I am really hoping someone will spot my obvious error. I have the following code which is pulling the category headers and but annoyingly not the href that goes with each one. When run, the code currently returns 'None' for all the href's but I cannot decipher why. I think it may be because I am targeting the wrong element, tag or class in the HTML but cannot correctly identify which one it should be.
from selenium import webdriver
import time
# The website to scrape
url = "https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/full-category-list"
# Creating the WebDriver object using the ChromeDriver
driver = webdriver.Chrome()
# Directing the driver to the defined url
driver.get(url)
# driver.implicitly_wait(5)
time.sleep(1)
# Locate the categories
categories = driver.find_elements_by_xpath('//div[@class="subCatEntry ng-scope"]')
# Print out all categories on current page
num_page_items = len(categories)
print(num_page_items)
for headers in range(num_page_items):
print(categories[headers].text)
for elem in categories:
print(elem.get_attribute("a.divLink[href='*']"))
# Clean up (close browser once task is completed)
time.sleep(1)
driver.close()
I would really appreciate if anyone can point out my error.
Upvotes: 0
Views: 84
Reputation: 2040
You are passing the CSS selector for the get_attribute
method. That wouldn't work. You have to provide the attribute name only. If the web element elem
has an attribute named href
then it would print the value of that attribute.
First, get the anchor <a>
element. All the subcategory anchors have class divLink
. For getting anchor elements try this,
categories = driver.find_elements_by_class_name('divLink')
Second, Print the attribute value by passing the attribute name in the get_ttribute
. Try this,
print(elem.get_attribute("href"))
This way you'll be able to print all the
href
values.
Upvotes: 0
Reputation: 33384
Try this below code.
for elem in categories:
print(elem.find_element_by_css_selector("a.divLink").get_attribute('href'))
Upvotes: 1