nokvk
nokvk

Reputation: 353

Beautifulsoup: how to iterate a table

I am trying to extract data from a dynamic table with the following structure:

Team 1 - Score - Team 2 - Minute first goal.

It is a table of soccer match results and there are about 10 matches per table and one table for each matchday. This is an example of the website in working with: https://www.resultados-futbol.com/premier/grupo1/jornada1

For this I am trying web scraping with BeautifulSoup in Python. Although I've made good progress, I'm running into a problem. I would like to generate a code that would iterate data by data each row of the table and I would get each data to a list so that I would have, for example:

List Team 1: Real Madrid, Barcelona
Score list: 1-0, 1-0
List Team 2: Atletico Madrid, Sevilla
First goal minutes list: 17', 64'

Once I have the lists, my intention is to make a complete dataframe with all the extracted data. However, I have the following problem: the matches that end 0-0. This implies that in the column Minute first goal there is none and it doesn't extract anything, so I can't 'fill' that value in any way in my dataframe and I get an error. To continue with the previous example, imagine that the second game has ended 0-0 and that in the 'Minutes first goal list' there is only one data (17').

In my mind the solution would be to create a loop that takes the data cell by cell and put a condition in 'Score' that if it is 0-0 to the list of Minutes first goal a value for example 'No goals' would be added.

This is the code I am using. I paste only the part in which I would like to create the loop:

page = BeautifulSoup(driver.page_source, 'html.parser') # I have to use Selenium previously cos I have to expand some buttons in the web
table = page.find('div', class_= 'contentitem').find_all('tr', class_= 'vevent')

teams1 = []
teams2 = []
scores = []

for cell in table:
    team1 = cell.find('td', class_='team1')
    for name in local:
        nteam1 = name.text
        teams1.append(nteam1)
        
    team2 = cell.find('td', class_='team2')
    for name in team2:
        nteam2 = name.text
        teams2.append(nteam2)
        
    score = cell.find('span', class_='clase')
    for name in score:
        nscore = name.text
        scores.append(nscore)

It is not clear to me how to iterate over the table to be able to store in the list the content of each cell and it is essential to include a condition "when the score cell is 0-0 create a non-goals entry in the list".

If someone could help me, I would be very grateful. Best regards

Upvotes: 0

Views: 82

Answers (1)

HedgeHog
HedgeHog

Reputation: 25048

You are close to your goal, but can optimize your script a bit.

  1. Do not use these different lists, just use one:

    data = []
    
  2. Try to get all information in one loop, there is an td that contains all the information and push a dict to your list:

    for row in soup.select('tr.vevent .rstd'):
        teams = row.select_one('.summary').get_text().split(' - ')
        score = row.select_one('.clase').get_text()
    
        data.append({
            'team1':teams[0],
            'team2':teams[1],
             'score': score if score != '0-0' else 'No goals'
         })
    
  3. Push your data into DataFrame

     pd.DataFrame(data)
    

Example

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd

driver = webdriver.Chrome(executable_path=r'C:\Program Files\ChromeDriver\chromedriver.exe')
url = 'https://www.resultados-futbol.com/premier/grupo1/jornada1'
driver.get(url)

soup = BeautifulSoup(driver.page_source, 'html.parser') # I have to use Selenium previously cos I have to expand some buttons in the web

data = []

for row in soup.select('tr.vevent .rstd'):
    teams = row.select_one('.summary').get_text().split(' - ')
    score = row.select_one('.clase').get_text()
    
    data.append({
        'team1':teams[0],
        'team2':teams[1],
        'score': score if score != '0-0' else 'No goals'
    })

pd.DataFrame(data)

Upvotes: 1

Related Questions