Beautifulsoup: extracting td list in table

Question

I'm stuck with a BeautifulSoup problem that I think is simple but I can't seem to solve. It is about extracting each td from the following table to create a loop and a list:




Team
Name
Number
Tipo
Motivo
Minute
Bloque




Barcelona
Player 1
16
Tarjeta Amarilla
Derribar a un contrario en la disputa del balón
88
Segundo tiempo
 
Real Madrid
Player 2
8
Tarjeta Amarilla
Sujetar a un adversario impidiendo su avance.
12
Primer tiempo

What I need is to create a dictionary with some elements of each tr to create a dataframe later. I would like to have a list with:

Team: Barcelona
Name: Player 1
Number: 16
Minute: 88
Team: Real Madrid
Name: Player 2
Number: 8
Minute: 12

As you can see, there are some tds that I don't need and I'd also like to 'jump' on them for my final df.

I've tried with this code (I only put a simplified example) but it doesn't work because I always take the name of the 1st team:

tabla = amonestaciones.find('table', class_='tabla-clasificacion-home marratua tablageneral tabla-actas')

rows = tabla.find_all('tr')

for row in rows:
    team = row.find('td')
    name = row.findNext('td')
    lista = {
        "Team": team,
        "Name": name
    }

This is the output I get (I also would like to remove the code but if I try .text or .get_text() I have the error 'NoneType' object has no attribute 'text'):

{'Team': Real Madrid, 'Name': Real Madrid}

I sense that I'm very close to the solution but I am stuck and I can't move forward. Thanks in advance for your help!

baduker · Accepted Answer

If you feel like learning something new, you don't even need bs4 (well, sort of). All you need is pandas (you get a dataframe out of the box) to get this:

-  -----------  --------  --  ----------------  -----------------------------------------------  --  --------------
0  Barcelona    Player 1  16  Tarjeta Amarilla  Derribar a un contrario en la disputa del balón  88  Segundo tiempo
1  Real Madrid  Player 2   8  Tarjeta Amarilla  Sujetar a un adversario impidiendo su avance.    12  Primer tiempo
-  -----------  --------  --  ----------------  -----------------------------------------------  --  --------------

With this:

import pandas as pd
from tabulate import tabulate

sample_html = """



Team
Name
Number
Tipo
Motivo
Minute
Bloque




Barcelona
Player 1
16
Tarjeta Amarilla
Derribar a un contrario en la disputa del balón
88
Segundo tiempo
 
Real Madrid
Player 2
8
Tarjeta Amarilla
Sujetar a un adversario impidiendo su avance.
12
Primer tiempo



"""

df = pd.read_html(sample_html, flavor="bs4")
df = pd.concat(df)
print(tabulate(df))
df.to_csv("your_table.csv", index=False)

The code also dumps your table to a .csv file:

Beautifulsoup: extracting td list in table

Answers (2)

Related Questions