sam oconnell
sam oconnell

Reputation: 1

Dealing with Empty Cells from Webpage

I'm trying to get all of the data from a table of basketball-reference (http://www.basketball-reference.com/leagues/NBA_2015_per_poss.html). When I use XPath to get the data, it comes in as one long list. I have a "chunks" method that would divide the list into multiple lists, but, as there are empty cells within the table, the method gets off and divides the list incorrectly. Is there any way to deal with this?

Upvotes: 0

Views: 39

Answers (1)

DeepSpace
DeepSpace

Reputation: 81684

My suggestion: use pandas.DataFrame. It can load data from many sources, including HTML.

You can easily handle empty cells with the fillna method.

Consider this example:

import pandas as pd

# read_excel returns list of dataframes.
# In this case we know there is only one in the page
df = pd.read_html('http://www.basketball-reference.com/leagues/NBA_2015_per_poss.html',
                  attrs={'id': 'per_poss'})[0] 

# the headers repeat every 20 lines, filtering them out
df = df[df['Rk'] != 'Rk'] 

# inserting 0 to empty cells
# could also use inplace=True kwarg instead of reassigning, or pass a 
# dictionary to use different value for each column 
df = df.fillna(0) 

Upvotes: 1

Related Questions