oldboy
oldboy

Reputation: 5964

Relative XPath Wrongly Selects Same Element in Loop

I'm scraping some data.

One of the data points I need is the date, but the table cells containing this data only include months and days. Luckily the year is used as a headline element to categorize the tables.

For some reason year = table.find_element(...) is selecting the same element for every iteration.

I would expect year = table.find_element(...) to select unique elements relative to each unique table element as it loops through all of them, but this isn't the case.

Actual Output

# random, hypothetical values
Page #1
  element="921"
  element="921"
  element="921"
  ...
Page #2
  element="1283"
  element="1283"
  element="1283"
...

Expected Output

# random, hypothetical values
Page #1
  element="921"
  element="922"
  element="923"
  ...
Page #2
  element="1283"
  element="1284"
  element="1285"
...

How come the following code selects the same element for every iteration on each page?

# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver import Firefox
from selenium.webdriver.common.by import By

links_sc2 = [
  'https://liquipedia.net/starcraft2/Premier_Tournaments',
  'https://liquipedia.net/starcraft2/Major_Tournaments',
  'https://liquipedia.net/starcraft2/Minor_Tournaments',
  'https://liquipedia.net/starcraft2/Minor_Tournaments/HotS',
  'https://liquipedia.net/starcraft2/Minor_Tournaments/WoL'
]
ff = webdriver.Firefox(executable_path=r'C:\\WebDriver\\geckodriver.exe')
urls = []

for link in links_sc2:
  tables = ff.find_elements(By.XPATH, '//h2/following::table')
  for table in tables:
    try:
      # premier, major
      year = table.find_element(By.XPATH, './preceding-sibling::h3/span').text
    except:
      # minor
      year = table.find_element(By.XPATH, './preceding-sibling::h2/span').text
    print(year)
ff.quit()

Upvotes: 3

Views: 200

Answers (1)

Kamal
Kamal

Reputation: 2554

You need to use ./preceding-sibling::h3[1]/span to get the nearest h3 sibling from the context element(your table).

The preceding-sibling works like this:

  • ./preceding-sibling::h3 will return the first h3 sibling in DOM order, which is year 2019 for you.

  • But if you use indexing, then ./preceding-sibling::h3[1] will return the nearest h3 element from the context element and further indexing reaches to the next match in reverse of DOM order. You can also use ./preceding-sibling::h3[last()] go get the farthest sibling.

Upvotes: 1

Related Questions