Reputation: 5964
I'm scraping some data.
One of the data points I need is the date, but the table cells containing this data only include months and days. Luckily the year is used as a headline element to categorize the tables.
For some reason year = table.find_element(...)
is selecting the same element for every iteration.
I would expect year = table.find_element(...)
to select unique elements relative to each unique table
element as it loops through all of them, but this isn't the case.
Actual Output
# random, hypothetical values
Page #1
element="921"
element="921"
element="921"
...
Page #2
element="1283"
element="1283"
element="1283"
...
Expected Output
# random, hypothetical values
Page #1
element="921"
element="922"
element="923"
...
Page #2
element="1283"
element="1284"
element="1285"
...
How come the following code selects the same element for every iteration on each page?
# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver import Firefox
from selenium.webdriver.common.by import By
links_sc2 = [
'https://liquipedia.net/starcraft2/Premier_Tournaments',
'https://liquipedia.net/starcraft2/Major_Tournaments',
'https://liquipedia.net/starcraft2/Minor_Tournaments',
'https://liquipedia.net/starcraft2/Minor_Tournaments/HotS',
'https://liquipedia.net/starcraft2/Minor_Tournaments/WoL'
]
ff = webdriver.Firefox(executable_path=r'C:\\WebDriver\\geckodriver.exe')
urls = []
for link in links_sc2:
tables = ff.find_elements(By.XPATH, '//h2/following::table')
for table in tables:
try:
# premier, major
year = table.find_element(By.XPATH, './preceding-sibling::h3/span').text
except:
# minor
year = table.find_element(By.XPATH, './preceding-sibling::h2/span').text
print(year)
ff.quit()
Upvotes: 3
Views: 200
Reputation: 2554
You need to use ./preceding-sibling::h3[1]/span
to get the nearest h3
sibling from the context element(your table).
The preceding-sibling
works like this:
./preceding-sibling::h3
will return the first h3
sibling in DOM
order, which is year 2019 for you.
But if you use indexing, then ./preceding-sibling::h3[1]
will
return the nearest h3
element from the context element and further
indexing reaches to the next match in reverse of DOM order. You can also use ./preceding-sibling::h3[last()]
go get the farthest sibling.
Upvotes: 1