Leo Torres
Leo Torres

Reputation: 690

Python Beautiful soup get correct column headers for each table

The following code gets player data but each dataset is different. The first data it sees is the quarterback data, so it uses these columns for all the data going forward. How can I change the header so that for every different dataset it encounters, the correct headers are used with the correct data?

import pandas as pd
import csv
from pprint import pprint

from bs4 import BeautifulSoup
import requests

url = 'https://www.espn.com/nfl/boxscore/_/gameId/401326313'# Create object page
soup = BeautifulSoup(requests.get(url).content, "html.parser")

rows = soup.select("table.mod-data tr")
#rows = soup.find_all("table.mod-data tr")
headers = [header.get_text(strip=True).encode("utf-8") for header in rows[0].find_all("th")]

data = [dict(zip(headers, [cell.get_text(strip=True).encode("utf-8") for cell in row.find_all("td")]))
        for row in rows[1:]]

df = pd.DataFrame(data)
df.to_csv('_Data_{}.csv'.format(pd.datetime.now().strftime("%Y-%m-%d %H%M%S")),index=False)

# see what the data looks like at this point
pprint(data)

Upvotes: 5

Views: 1769

Answers (2)

Richard K Yu
Richard K Yu

Reputation: 2202

Here is my attempt. A few things to note. I am not printing to CSV but just showing you the dataframes with the correct header information, you can handle the CSV output later.

You press enter after running the program to see the next tables with different headers.

import pandas as pd
import csv
from pprint import pprint

from bs4 import BeautifulSoup
import requests


#Made this function because the rows we want for different headers are different.
def get_individual_table(headers, rows, start,end):
    '''for header in headers:
        print(header)'''


    data = [dict(zip(headers, [cell.get_text(strip=True).encode("utf-8") for cell in row.find_all("td")]))
            for row in rows[start+1:end]]

    df = pd.DataFrame(data)
    #print(df)
    return df

url = 'https://www.espn.com/nfl/boxscore/_/gameId/401326313'# Create object page
soup = BeautifulSoup(requests.get(url).content, "html.parser")

rows = soup.select("table.mod-data tr")

row_team_names = soup.select("div.team-name")

#rows = soup.find_all("table.mod-data tr")


#Only some of the row indicies have correct header information, we find them here.
correct_row_indices = []
for idx, row in enumerate(rows):
    #print(idx, row.find_all("th"))
    if len(row.find_all("th"))>0:
        correct_row_indices.append(idx)


print(correct_row_indices)
headers = [header.get_text(strip=True).encode("utf-8") for header in rows[0].find_all("th")]

team_names =[]

for name in row_team_names:
    team_names.append(name.text)
    

for i in range(len(correct_row_indices)-1):
    headers = [header.get_text(strip=True).encode("utf-8") for header in rows[correct_row_indices[i]].find_all("th")]
    #print(headers)
    

    #Use the function to get the tables individually
    df = get_individual_table(headers, rows, correct_row_indices[i], correct_row_indices[i+1])
    if not df.empty:
        df["Team Name"] = team_names[i]

    #Press enter to see each individual table.
    print(df)
    input()
    df.to_csv('_Data_{}.csv'.format(pd.datetime.now().strftime("%Y-%m-%d %H%M%S")),index=False)
    
    






#df.to_csv('_Data_{}.csv'.format(pd.datetime.now().strftime("%Y-%m-%d %H%M%S")),index=False)

# see what the data looks like at this point
#pprint(data)

Here is part of the output: enter image description here

As you can see, we get each distinct DataFrame with the different headers as necessary. Observe how each DataFrame matches the table on the ESPN site:

enter image description here

You can choose how to process them individually later in the loop logic. Not all tables are shown. Please keep pressing enter after initially running the program and all the table information on that page will eventually display.

Edit: I have changed the code to output individual CSV files for each table. enter image description here

I show the first two tables outputted above.

Upvotes: 2

HedgeHog
HedgeHog

Reputation: 25221

As mentioned expected result is not that clear, but if you just wanna read the tables use pandas.read_html to achieve your goal - index_col=0 avoids that the first column, that has no header is named Unnamed_0.

pd.read_html('https://www.espn.com/nfl/boxscore/_/gameId/401326313',index_col=0)

Example

import pandas as pd

for table in pd.read_html('https://www.espn.com/nfl/boxscore/_/gameId/401326313',index_col=0):
    pd.DataFrame(table).to_csv('_Data_{}.csv'.format(pd.datetime.today().strftime("%Y-%m-%d %H%M%S.%f")))

As alternative you can reset_index() and use to_csv(index=False):

pd.DataFrame(table).rename_axis('').reset_index().to_csv('_Data_{}.csv'.format(pd.datetime.today().strftime("%Y-%m-%d %H%M%S.%f")),index=False)

EDIT

Using captions in tables and to store results in named csv files:

import pandas as pd
from bs4 import BeautifulSoup

url = 'https://www.espn.com/nfl/boxscore/_/gameId/401326313'
soup = BeautifulSoup(requests.get(url).content, "html.parser")


for table in soup.select('article.boxscore-tabs table'):
    caption = '_'.join(table.parent.select_one('.table-caption').text.split(' '))
    df = pd.read_html(table.prettify(),index_col=0)[0].rename_axis('').reset_index()
    df.insert(0, 'caption', caption)
    df.to_csv(f'_DATA_{caption}_{pd.datetime.now().strftime("%Y-%m-%d %H%M%S")}.csv',index=False)

Output of your csv files

caption,,C/ATT,YDS,AVG,TD,INT,SACKS,QBR,RTG
Miami_Passing,Tua Tagovailoa  T. Tagovailoa,16/27,202,7.5,1,1,2-17,47.5,79.6
Miami_Passing,TEAM,16/27,185,7.5,1,1,2-17,--,79.6

caption,,C/ATT,YDS,AVG,TD,INT,SACKS,QBR,RTG
New_England_Passing,Mac Jones  M. Jones,29/39,281,7.2,1,0,1-13,76.9,102.6
New_England_Passing,TEAM,29/39,268,7.2,1,0,1-13,--,102.6

caption,,CAR,YDS,AVG,TD,LONG
Miami_Rushing,Myles Gaskin  M. Gaskin,9,49,5.4,0,15
Miami_Rushing,Malcolm Brown  M. Brown,5,16,3.2,0,5
Miami_Rushing,Jacoby Brissett  J. Brissett,2,4,2.0,0,2
Miami_Rushing,Salvon Ahmed  S. Ahmed,3,4,1.3,0,8
Miami_Rushing,Tua Tagovailoa  T. Tagovailoa,4,1,0.3,1,3
Miami_Rushing,TEAM,23,74,3.2,1,15

caption,,CAR,YDS,AVG,TD,LONG
New_England_Rushing,Damien Harris  D. Harris,23,100,4.3,0,35
New_England_Rushing,James White  J. White,4,12,3.0,0,10
New_England_Rushing,Jonnu Smith  J. Smith,1,6,6.0,0,6
New_England_Rushing,Brandon Bolden  B. Bolden,1,5,5.0,0,5
New_England_Rushing,Rhamondre Stevenson  R. Stevenson,1,2,2.0,0,2
New_England_Rushing,TEAM,30,125,4.2,0,35

caption,,REC,YDS,AVG,TD,LONG,TGTS
Miami_Receiving,DeVante Parker  D. Parker,4,81,20.3,0,30,7
Miami_Receiving,Jaylen Waddle  J. Waddle,4,61,15.3,1,36,5
Miami_Receiving,Myles Gaskin  M. Gaskin,5,27,5.4,0,12,5
Miami_Receiving,Salvon Ahmed  S. Ahmed,2,24,12.0,0,18,3
Miami_Receiving,Durham Smythe  D. Smythe,1,9,9.0,0,9,2
Miami_Receiving,Albert Wilson  A. Wilson,0,0,0.0,0,0,2
Miami_Receiving,Mike Gesicki  M. Gesicki,0,0,0.0,0,0,3
Miami_Receiving,TEAM,16,202,12.6,1,36,27

caption,,REC,YDS,AVG,TD,LONG,TGTS
New_England_Receiving,Nelson Agholor  N. Agholor,5,72,14.4,1,25,7
New_England_Receiving,James White  J. White,6,49,8.2,0,26,7
New_England_Receiving,Jakobi Meyers  J. Meyers,6,44,7.3,0,22,9
New_England_Receiving,Jonnu Smith  J. Smith,5,42,8.4,0,11,5
New_England_Receiving,Hunter Henry  H. Henry,3,31,10.3,0,16,3
New_England_Receiving,Kendrick Bourne  K. Bourne,1,17,17.0,0,17,3
New_England_Receiving,Damien Harris  D. Harris,2,17,8.5,0,9,3
New_England_Receiving,Rhamondre Stevenson  R. Stevenson,1,9,9.0,0,9,1
New_England_Receiving,TEAM,29,281,9.7,1,26,38

caption,,FUM,LOST,REC
Miami_Fumbles,Xavien Howard  X. Howard,0,0,1
Miami_Fumbles,Zach Sieler  Z. Sieler,0,0,1
Miami_Fumbles,TEAM,0,0,2

caption,,FUM,LOST,REC
New_England_Fumbles,David Andrews  D. Andrews,0,0,1
New_England_Fumbles,Damien Harris  D. Harris,1,1,0
New_England_Fumbles,Rhamondre Stevenson  R. Stevenson,1,1,0
New_England_Fumbles,Jonnu Smith  J. Smith,1,0,1
New_England_Fumbles,Mac Jones  M. Jones,1,0,0
New_England_Fumbles,TEAM,4,2,2

caption,,tackles,tackles,tackles,tackles,misc,misc,misc,misc
,,TOT,SOLO,SACKS,TFL,PD,QB HTS,TD,Unnamed: 8_level_1
Miami_Defensive,Jerome Baker  J. Baker,12,9,0,0,0,0,0,
Miami_Defensive,Eric Rowe  E. Rowe,9,6,0,0,0,0,0,
Miami_Defensive,Byron Jones  B. Jones,6,5,0,0,1,0,0,
Miami_Defensive,Nik Needham  N. Needham,6,5,0,0,0,0,0,
Miami_Defensive,Sam Eguavoen  S. Eguavoen,6,2,0,0,0,3,0,
Miami_Defensive,Xavien Howard  X. Howard,5,4,0,0,0,0,0,
Miami_Defensive,Jason McCourty  J. McCourty,5,3,0,0,1,0,0,
Miami_Defensive,Brennan Scarlett  B. Scarlett,5,2,0,0,1,1,0,
Miami_Defensive,Andrew Van Ginkel  A. Van Ginkel,5,2,0,0,0,1,0,
Miami_Defensive,John Jenkins  J. Jenkins,4,4,0,0,0,0,0,
Miami_Defensive,Emmanuel Ogbah  E. Ogbah,3,3,0,1,1,1,0,
Miami_Defensive,Zach Sieler  Z. Sieler,3,2,0,1,0,0,0,
Miami_Defensive,Christian Wilkins  C. Wilkins,3,2,0,0,0,1,0,
Miami_Defensive,Elandon Roberts  E. Roberts,2,2,0,0,1,1,0,
Miami_Defensive,Jamal Perry  J. Perry,2,2,0,0,0,0,0,
Miami_Defensive,Brandon Jones  B. Jones,2,2,0,0,0,0,0,
Miami_Defensive,Jevon Holland  J. Holland,2,2,0,0,0,0,0,
Miami_Defensive,Adam Butler  A. Butler,2,1,0,0,0,0,0,
Miami_Defensive,Mack Hollins  M. Hollins,2,0,0,0,0,0,0,
Miami_Defensive,Team  Team,1,1,1,0,0,0,0,
Miami_Defensive,Mike Gesicki  M. Gesicki,1,1,0,0,0,0,0,
Miami_Defensive,Durham Smythe  D. Smythe,1,0,0,0,0,0,0,
Miami_Defensive,Jaelan Phillips  J. Phillips,0,0,0,0,0,1,0,
Miami_Defensive,TEAM,87,60,1,2,5,9,0,

caption,,tackles,tackles,tackles,tackles,misc,misc,misc,misc
,,TOT,SOLO,SACKS,TFL,PD,QB HTS,TD,Unnamed: 8_level_1
New_England_Defensive,Kyle Dugger  K. Dugger,7,6,0,1,0,0,0,
New_England_Defensive,Devin McCourty  D. McCourty,7,4,0,0,0,0,0,
New_England_Defensive,Ja'Whaun Bentley  J. Bentley,4,4,0,1,0,0,0,
New_England_Defensive,Matthew Judon  M. Judon,4,3,0,1,0,1,0,
New_England_Defensive,Lawrence Guy  L. Guy,4,2,0,0,0,1,0,
New_England_Defensive,Dont'a Hightower  D. Hightower,4,2,0,0,0,0,0,
New_England_Defensive,J.C. Jackson  J.C. Jackson,3,3,0,0,1,0,0,
New_England_Defensive,Kyle Van Noy  K. Van Noy,3,2,1,1,1,1,0,
New_England_Defensive,Adrian Phillips  A. Phillips,3,2,0,2,0,0,0,
New_England_Defensive,Davon Godchaux  D. Godchaux,3,2,0,0,0,0,0,
New_England_Defensive,Jalen Mills  J. Mills,2,2,0,0,1,0,0,
New_England_Defensive,Josh Uche  J. Uche,1,1,1,1,0,1,0,
New_England_Defensive,Carl Davis  C. Davis,1,1,0,0,0,0,0,
New_England_Defensive,Chase Winovich  C. Winovich,1,1,0,0,0,0,0,
New_England_Defensive,Joejuan Williams  J. Williams,1,1,0,0,0,0,0,
New_England_Defensive,Christian Barmore  C. Barmore,1,0,0,0,0,0,0,
New_England_Defensive,Jonathan Jones  J. Jones,0,0,0,0,1,0,0,
New_England_Defensive,TEAM,49,36,2,7,4,4,0,

caption,,INT,YDS,TD
Miami_Interceptions,No Miami Interceptions,,,

caption,,INT,YDS,TD
New_England_Interceptions,Jonathan Jones  J. Jones,1,0,0
New_England_Interceptions,TEAM,1,0,0

caption,,NO,YDS,AVG,LONG,TD
Miami_Kick_Returns,No Miami Kick Returns,,,,,

caption,,NO,YDS,AVG,LONG,TD
New_England_Kick_Returns,Brandon Bolden  B. Bolden,1,23,23.0,23,0
New_England_Kick_Returns,Gunner Olszewski  G. Olszewski,1,17,17.0,17,0
New_England_Kick_Returns,TEAM,2,40,20.0,23,0

caption,,NO,YDS,AVG,LONG,TD
Miami_Punt_Returns,Jakeem Grant Sr.  J. Grant Sr.,1,18,18.0,18,0
Miami_Punt_Returns,TEAM,1,18,18.0,18,0

caption,,NO,YDS,AVG,LONG,TD
New_England_Punt_Returns,Gunner Olszewski  G. Olszewski,3,20,6.7,14,0
New_England_Punt_Returns,TEAM,3,20,6.7,14,0

caption,,FG,PCT,LONG,XP,PTS
Miami_Kicking,Jason Sanders  J. Sanders,1/1,100.0,48,2/2,5
Miami_Kicking,TEAM,1/1,100.0,48,2/2,5

caption,,FG,PCT,LONG,XP,PTS
New_England_Kicking,Nick Folk  N. Folk,3/3,100.0,42,1/1,10
New_England_Kicking,TEAM,3/3,100.0,42,1/1,10

caption,,NO,YDS,AVG,TB,In 20,LONG
Miami_Punting,Michael Palardy  M. Palardy,4,180,45.0,1,0,52
Miami_Punting,TEAM,4,180,45.0,1,0,52

caption,,NO,YDS,AVG,TB,In 20,LONG
New_England_Punting,Jake Bailey  J. Bailey,2,99,49.5,1,0,62
New_England_Punting,TEAM,2,99,49.5,1,0,62

Upvotes: 1

Related Questions