TiTo
TiTo

Reputation: 865

Fetching <td> text next to <th> tag with specific text

I'd linke to retrieve information form a couple of players from transfermarkt.de, e.g Manuel Neuer's birthday. Here is how the relevant html looks like:

<tr>
    <th>Geburtsdatum:</th>
    <td>
        <a href="/aktuell/waspassiertheute/aktuell/new/datum/1986-03-27">27.03.1986</a>                                     
    </td>
</tr>

I know I could get the date by using the following code:

soup = BeautifulSoup(source_code, "html.parser")
player_attributes = soup.find("table", class_ = 'auflistung')
rows = player_attributes.find_all('tr')
date_of_birth = re.search(r'([0-9]+\.[0-9]+\.[0-9]+)', rows[1].get_text(), re.M)[0]

but that is quite fragile. E.g. for Robert Lewandowski the date of birth is in a different position of the table. So, which attributes appear at the players profile differs. Is there a way to logically do

the more robust the better :)

Upvotes: 0

Views: 48

Answers (1)

Alexandra Dudkina
Alexandra Dudkina

Reputation: 4462

BeautifulSoup allows retrieve next sibling using method findNext():

from bs4 import BeautifulSoup
import requests

html = requests.get('https://www.transfermarkt.de/manuel-neuer/profil/spieler/17259', headers = {'User-Agent': 'Custom'})
soup = BeautifulSoup(source_code, "html.parser") 
player_attributes = soup.find("table", class_ = 'auflistung') 
rows = player_attributes.find_all('tr') 

def get_table_value(rows, table_header):
    for row in rows:
        helpers = row.find_all(text=re.compile(table_header))
        if helpers is not None:
            for helper in helpers:
                return helper.find_next('td').get_text()

Upvotes: 1

Related Questions