Baili
Baili

Reputation: 159

Extracting element after another element with xpath/beautiful soup

enter image description here

I'm looking for a robust way to extract both team names and market odds. Given the above code segment this would be

West Brom Man City 28/1 6/1 1/8

I should also mention that I would only need team names and market odds AFTER a given fixture id (which is located in the 'data-fixtureid' attribute).

I have tried the following xpath expression:

    tree.xpath('//span[@class="ippg-Market_Truncator"]/following::div[@data-fixtureid="66705048"]//text()')

to extract the team names, which didn't result in the desired output.

I'd appreciate if someone could point me in the right direction. I don't necessarily need to use xpath for this, but could also use beautiful soup for example.

Upvotes: 1

Views: 750

Answers (1)

chad
chad

Reputation: 838

This answer is different from xpath since I used find_all() and find() functions to achieve your desired result.

First, I look for all the rows you need with a class name podEventRow

Second, I loop through that list and looked for the team name with class ippg-Market_CompetitorName then strip/replace unnecessary whitespaces.

Third, inside the same loop I looked for the market odds using the class name ippg-Market_Topic and then loop through each odds to get the text inside each.

podEventRow = soup.find_all('div', class_="podEventRow")
for row in podEventRow:
    team_name = row.find('div', class_="ippg-Market_CompetitorName").get_text(strip=True).replace('\t\r\n', '')
    market_odds_raw = row.find_all('div', class_="ippg-Market_Topic")
    market_odds = ''
    for odd in market_odds_raw:
        market_odds += ' - ' + odd.get_text(strip=True).replace('\t\r\n', '')
    print(team_name + market_odds)

PS: I used selenium to get the complete page source since the site uses JavaScript to load the table.

Upvotes: 1

Related Questions