python pd.read_html gives error

Question

I've been using pandas and request to pull some tables to get NFL statistics. It's been going pretty well, I've been able to pull tables from other sites, until I tried to get NFL combine table from this one particular site.

It gives me the error message after df_list = pd.read_html(html)

The error I get is:

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('



Here's the code I've been using at other sites that worked really well.

import requests
import pandas as pd
df = pd.DataFrame()

url = 'http://nflcombineresults.com/nflcombinedata_expanded.php?
       year=1987&pos=&college='
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]


I've read and seen a little bit about BeautifulSoup, but the simplicity of the pd.read_html() is just so nice and compact. So I don't know if there's a quick fix that I am not aware of, or if I need to indeed dive into BeautifulSoup to get these tables from 1987 - 2017.

payne · Accepted Answer

This isn't shorter, but may be more robust:

import requests
import pandas as pd
from bs4 import BeautifulSoup

A convenience function:

def souptable(table):
    for row in table.find_all('tr'):
        yield [col.text for col in row.find_all('td')]

Return a DataFrame with data loaded for a given year:

def getyear(year):
    url = 'http://nflcombineresults.com/nflcombinedata_expanded.php?year=%d&pos=&college=' % year
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'lxml')
    data = list(souptable(soup.table))
    df = pd.DataFrame(data[1:], columns=data[0])
    df = df[pd.notnull(df['Name'])]
    return df.apply(pd.to_numeric, errors="ignore")

This function slices out the heading row when the DataFrame is created, uses the first row for column names, and filters out any rows with an empty Name value.

Finally, concatenate up as many years as you need into a single DataFrame:

dfs = pd.concat([getyear(year) for year in range(1987, 1990)])

python pd.read_html gives error

Answers (2)

Related Questions