Vasilis Skentos
Vasilis Skentos

Reputation: 556

iterate through a HTML table with names embedding in link tags

I am a beginner in web scraping with Beautiful Soup and I am trying to get the teams with their wins and losses in Euroleague from https://www.basketball-reference.com/international/euroleague/2020.html enter image description here

I want to iterate through this table and get the name wins and losses and insert them in a CSV a list or JSON file later. With my code below I can get only the HTML of the first element in the table even if I try a for loop :

from bs4 import BeautifulSoup as bs 
import requests 
from requests import get 
import pandas as pd 
import json 
import time 
from time import sleep



url = 'https://www.basketball-reference.com/international/euroleague/2020.html' 

time.sleep(2)
source = requests.get(url).text

time.sleep(4)
soup = bs(source,'lxml')

time.sleep(2)
for item in soup.find_all('div' , class_='table_outer_container'):
    #prints only first item
    team=item.div.table.tbody.tr
    print(team)

The structure of the table element :

<div class="table_outer_container">
      <div class="overthrow table_container" id="div_elg_standings">
      
  <table class="sortable stats_table now_sortable" id="elg_standings" data-cols-to-freeze="1"><caption>EuroLeague Standings Table</caption>
   <colgroup><col><col><col></colgroup>
   <thead>
      
      <tr class="over_header"><th></th>
         <th aria-label="" data-stat="Regular Season" colspan="2" class=" over_header center">Regular Season</th>
      </tr>
      

      
      <tr>
         <th aria-label="&nbsp;" data-stat="team" scope="col" class=" poptip center">&nbsp;</th>
         <th aria-label="Wins" data-stat="wins|Regular Season" scope="col" class=" poptip right" data-tip="Wins" data-over-header="Regular Season">W</th>
         <th aria-label="Losses" data-stat="losses|Regular Season" scope="col" class=" poptip right" data-tip="Losses" data-over-header="Regular Season">L</th>
      </tr>
      
   </thead>
   <tbody>
<tr data-row="0"><th scope="row" class="left " data-stat="team"><a href="/international/teams/anadolu-efes/2020.html">Anadolu Efes</a></th><td class="right " data-stat="wins|Regular Season">24</td><td class="right " data-stat="losses|Regular Season">4</td></tr>
<tr data-row="1"><th scope="row" class="left " data-stat="team"><a href="/international/teams/real-madrid/2020.html">Real Madrid</a></th><td class="right " data-stat="wins|Regular Season">22</td><td class="right " data-stat="losses|Regular Season">6</td></tr>
<tr data-row="2"><th scope="row" class="left " data-stat="team"><a href="/international/teams/barcelona/2020.html">FC Barcelona</a></th><td class="right " data-stat="wins|Regular Season">22</td><td class="right " data-stat="losses|Regular Season">6</td></tr>
<tr data-row="3"><th scope="row" class="left " data-stat="team"><a href="/international/teams/cska-moscow/2020.html">CSKA Moscow</a></th><td class="right " data-stat="wins|Regular Season">19</td><td class="right " data-stat="losses|Regular Season">9</td></tr>
<tr data-row="4"><th scope="row" class="left " data-stat="team"><a href="/international/teams/maccabi-tel-aviv/2020.html">Maccabi FOX Tel Aviv</a></th><td class="right " data-stat="wins|Regular Season">19</td><td class="right " data-stat="losses|Regular Season">9</td></tr>
<tr data-row="5"><th scope="row" class="left " data-stat="team"><a href="/international/teams/panathinaikos/2020.html">Panathinaikos OPAP</a></th><td class="right " data-stat="wins|Regular Season">14</td><td class="right " data-stat="losses|Regular Season">14</td></tr>
<tr data-row="6"><th scope="row" class="left " data-stat="team"><a href="/international/teams/ulker-fenerbahce/2020.html">Fenerbahçe Beko</a></th><td class="right " data-stat="wins|Regular Season">13</td><td class="right " data-stat="losses|Regular Season">15</td></tr>
<tr data-row="7"><th scope="row" class="left " data-stat="team"><a href="/international/teams/khimki/2020.html">Khimki</a></th><td class="right " data-stat="wins|Regular Season">13</td><td class="right " data-stat="losses|Regular Season">15</td></tr>
<tr data-row="8"><th scope="row" class="left " data-stat="team"><a href="/international/teams/vitoria/2020.html">Kirolbet Baskonia</a></th><td class="right " data-stat="wins|Regular Season">12</td><td class="right " data-stat="losses|Regular Season">16</td></tr>
<tr data-row="9"><th scope="row" class="left " data-stat="team"><a href="/international/teams/olympiakos/2020.html">Olympiacos</a></th><td class="right " data-stat="wins|Regular Season">12</td><td class="right " data-stat="losses|Regular Season">16</td></tr>
<tr data-row="10"><th scope="row" class="left " data-stat="team"><a href="/international/teams/zalgiris/2020.html">Žalgiris</a></th><td class="right " data-stat="wins|Regular Season">12</td><td class="right " data-stat="losses|Regular Season">16</td></tr>
<tr data-row="11"><th scope="row" class="left " data-stat="team"><a href="/international/teams/valencia/2020.html">Valencia Basket</a></th><td class="right " data-stat="wins|Regular Season">12</td><td class="right " data-stat="losses|Regular Season">16</td></tr>
<tr data-row="12"><th scope="row" class="left " data-stat="team"><a href="/international/teams/milano/2020.html">AX Armani Exchange Olimpia</a></th><td class="right " data-stat="wins|Regular Season">12</td><td class="right " data-stat="losses|Regular Season">16</td></tr>
<tr data-row="13"><th scope="row" class="left " data-stat="team"><a href="/international/teams/red-star/2020.html">Crvena zvezda mts</a></th><td class="right " data-stat="wins|Regular Season">11</td><td class="right " data-stat="losses|Regular Season">17</td></tr>
<tr data-row="14"><th scope="row" class="left " data-stat="team"><a href="/international/teams/villeurbanne/2020.html">LDLC ASVEL</a></th><td class="right " data-stat="wins|Regular Season">10</td><td class="right " data-stat="losses|Regular Season">18</td></tr>
<tr data-row="15"><th scope="row" class="left " data-stat="team"><a href="/international/teams/alba-berlin/2020.html">Alba Berlin</a></th><td class="right " data-stat="wins|Regular Season">9</td><td class="right " data-stat="losses|Regular Season">19</td></tr>
<tr data-row="16"><th scope="row" class="left " data-stat="team"><a href="/international/teams/triumph-moscow/2020.html">Zenit Saint Petersburg</a></th><td class="right " data-stat="wins|Regular Season">8</td><td class="right " data-stat="losses|Regular Season">20</td></tr>
<tr data-row="17"><th scope="row" class="left " data-stat="team"><a href="/international/teams/bayern-muenchen/2020.html">Bayern Munich</a></th><td class="right " data-stat="wins|Regular Season">8</td><td class="right " data-stat="losses|Regular Season">20</td></tr>

</tbody></table>

      </div>
   </div>

I would appreciate your help with guiding me to iterate through this element correctly and get the team name, wins, and losses. Thank you in advance.

Upvotes: 1

Views: 58

Answers (1)

Philippe Remy
Philippe Remy

Reputation: 3123

Try this:

Code

import requests
from bs4 import BeautifulSoup

url = 'https://www.basketball-reference.com/international/euroleague/2020.html'

soup = BeautifulSoup(requests.get(url).text, 'html.parser')
teams = soup.find('div', class_='table_outer_container')

for team in teams.find_all('a'):
    # prints only first item
    team_name = team.text
    wins = team.parent.parent.find('td', {'data-stat': 'wins|Regular Season'}).text
    losses = team.parent.parent.find('td', {'data-stat': 'losses|Regular Season'}).text
    print(team_name, wins, losses)

Output

Anadolu Efes 24 4
Real Madrid 22 6
FC Barcelona 22 6
CSKA Moscow 19 9
Maccabi FOX Tel Aviv 19 9
Panathinaikos OPAP 14 14
Fenerbahçe Beko 13 15
Khimki 13 15
Kirolbet Baskonia 12 16
Olympiacos 12 16
Žalgiris 12 16
Valencia Basket 12 16
AX Armani Exchange Olimpia 12 16
Crvena zvezda mts 11 17
LDLC ASVEL 10 18
Alba Berlin 9 19
Zenit Saint Petersburg 8 20
Bayern Munich 8 20

Upvotes: 1

Related Questions