Reputation: 9
This is the code I used. I am using Jupternotebook web version. I upgraded the XML, and python version is 3.8.
import numpy as np
import requests
from lxml import html
import csv
import pandas as pd
# getting the web content
r = requests.get('http://www.pro-football-reference.com/years/2017/draft.htm')
data = html.fromstring(r.text)
collecting specific data
pick = data.xpath('//td[@data_stat="draft_pick"]//text()')
player = data.xpath('//td[@data_stat="player"]//text()')
position = data.xpath('//td[@data_stat="pos"]//text()')
age= data.xpath('//td[@data_stat="age"]//text()')
games_played = data.xpath('//td[@data_stat="g"]//text()')
cmp = data.xpath('//td[@data_stat="pass_cmp"]//text()')
att = data.xpath('//td[@data_stat="pass_att"]//text()')
college = data.xpath('//td[@data_stat="college_id"]//text()')
data = list(zip(pick,player,position,age,games_played,cmp,att,college))
df = pd.DataFrame(data)
df
There are two errors showing on two separate files I tried:
The code is not giving me the list of data I wanted from the webpage. Can anyone help me out with this? Thank you in advance.
Upvotes: 0
Views: 30
Reputation: 9639
You can load html tables directly into a dataframe using read_html
:
import pandas as pd
df = pd.read_html('http://www.pro-football-reference.com/years/2017/draft.htm')[0]
df.columns = df.columns.droplevel(0) # drop top header row
df = df[df['Rnd'].ne('Rnd')] # remove mid-table header rows
Output:
| | Rnd | Pick | Tm | Player | Pos | Age | To | AP1 | PB | St | CarAV | DrAV | G | Cmp | Att | Yds | TD | Int | Att | Yds | TD | Rec | Yds | TD | Solo | Int | Sk | College/Univ | Unnamed: 28_level_1 |
|---:|------:|-------:|:-----|:------------------|:------|------:|-----:|------:|-----:|-----:|--------:|-------:|----:|------:|------:|------:|-----:|------:|------:|------:|-----:|------:|------:|-----:|-------:|------:|------:|:---------------|:----------------------|
| 0 | 1 | 1 | CLE | Myles Garrett | DE | 21 | 2020 | 1 | 2 | 4 | 35 | 35 | 51 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 107 | nan | 42.5 | Texas A&M | College Stats |
| 1 | 1 | 2 | CHI | Mitchell Trubisky | QB | 23 | 2020 | 0 | 1 | 3 | 33 | 33 | 51 | 1010 | 1577 | 10609 | 64 | 37 | 190 | 1057 | 8 | 0 | 0 | 0 | nan | nan | nan | North Carolina | College Stats |
| 2 | 1 | 3 | SFO | Solomon Thomas | DE | 22 | 2020 | 0 | 0 | 2 | 15 | 15 | 48 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 73 | nan | 6 | Stanford | College Stats |
| 3 | 1 | 4 | JAX | Leonard Fournette | RB | 22 | 2020 | 0 | 0 | 3 | 25 | 20 | 49 | 0 | 0 | 0 | 0 | 0 | 763 | 2998 | 23 | 170 | 1242 | 2 | nan | nan | nan | LSU | College Stats |
| 4 | 1 | 5 | TEN | Corey Davis | WR | 22 | 2020 | 0 | 0 | 4 | 25 | 25 | 56 | 0 | 0 | 0 | 0 | 0 | 6 | 55 | 0 | 207 | 2851 | 11 | nan | nan | nan | West. Michigan | College Stats |
Upvotes: 1