Reputation:
I'm attempting to iterate though a list of elements and print the text, however when I select an element an inside of another element, selenium returns the element inside the first sibling element and NOT the element inside of the element I'm actually interested in which is just, incredibly odd and frustrating. https://www.thecompleteuniversityguide.co.uk/courses/details/computing-science-with-a-year-in-industry-bsc/54983514 This is the website I'm trying to scrape from, and I'm looking at the modules section. The key part of my code:
import time
from selenium.webdriver import Chrome
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.add_argument('--headless')
driver = Chrome(executable_path = 'D:\Programs\Python\chromedriver.exe', options = opts)
driver.get("https://www.thecompleteuniversityguide.co.uk/courses/details/computing-science-with-a-year-in-industry-bsc/54983514")
closeButton = driver.find_element_by_xpath("//a[@id='closeFilter']")
closeButton.click()
driver.find_element_by_xpath("//a[@id='acceptCookie']").click()
modules_container = driver.find_element_by_xpath("//div[@data-sub-sec='Modules']").find_element_by_class_name("cdsb_rt")
numberOfModulesByYear = len(modules_container.find_elements_by_xpath("//div[@class='mdldv']"))
previousNumberOfModules = 0
for moduleYear in range(numberOfModulesByYear):
moduleYearButtonString = "//div[@class='mdldv' and @data-module-sections='{}']".format(str(moduleYear))
module_year = modules_container.find_element_by_xpath(moduleYearButtonString)
module_year_a = module_year.find_element_by_tag_name("a")
time.sleep(0.5)
while module_year_a.find_element_by_tag_name("span").get_attribute("class") == "icon icon-add":
module_year_a.click()
while len(module_year.find_elements_by_xpath("//div[@class='mdiv']")) - previousNumberOfModules == 0:
time.sleep(0.01)
listOfModules = module_year.find_elements_by_xpath("//div[@class='mdiv']")
previousNumberOfModules = len(module_year.find_elements_by_xpath("//div[@class='mdiv']"))
for _, module in enumerate(listOfModules):
print(module.find_element_by_tag_name("a").find_element_by_xpath("//span[@class='mdltxt']").get_attribute("outerHTML"))
print("\n")
The output I get is:
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
This doesn't make any sense to me? as when I check the a element HTML it shows the correct name but when I try to access it via the xpath function it returns the wrong one? Can anyone help figure out why this happens? It seems incredibly unintuitive if this is the intended behaviour.
Edit:
For anyone potentially reading this in the future, I did more research on xpath and after looking at way too websites explaining this, if you want to look at the current node, and only the current node child elements, start the xpath with ".//"
, the fullstop means it will only look at that element and the // means it's relative (or so I believe)
Not an xpath problem, just a simple formatting problem which can be scary to people who are new to this kind of stuff. best of luck to everyone doing this!
Explanations: What is the difference between .// and //* in XPath?
Upvotes: 0
Views: 147
Reputation: 298
This seems to be a problem with relative xpaths? I'm not quite sure. But when I use the class name to find the element it works:
print(module.find_element_by_tag_name("a").find_element_by_class_name('mdltxt').get_attribute("outerHTML"))
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Database Systems (20 credits) - Core</span>
<span class="mdltxt">Web-Based Programming (20 credits) - Core</span>
<span class="mdltxt">Systems Development (20 credits) - Core</span>
<span class="mdltxt">Computing Principles (20 credits) - Core</span>
<span class="mdltxt">Programming 1 (20 credits) - Core</span>
<span class="mdltxt">Database Systems (20 credits) - Core</span>
<span class="mdltxt">Web-Based Programming (20 credits) - Core</span>
<span class="mdltxt">Systems Development (20 credits) - Core</span>
<span class="mdltxt">Computing Principles (20 credits) - Core</span>
<span class="mdltxt">Software Engineering 1 (20 credits) - Core</span>
<span class="mdltxt">Programming 2 (20 credits) - Core</span>
<span class="mdltxt">Architectures and Operating Systems (20 credits) - Core</span>
<span class="mdltxt">Data Structures and Algorithms (20 credits) - Core</span>
<span class="mdltxt">Year in Industry (80 credits) - Core</span>
<span class="mdltxt">Industrial Project Report (40 credits) - Core</span>
Upvotes: 0