topjor
topjor

Reputation: 9

Pandas cant print the list of objects collected from web using xpath in Jupyter notebook

This is the code I used. I am using Jupternotebook web version. I upgraded the XML, and python version is 3.8.

import numpy as np
import requests 
from lxml import html
import csv
import pandas as pd

# getting the web content

r = requests.get('http://www.pro-football-reference.com/years/2017/draft.htm')
data = html.fromstring(r.text)

collecting specific data

pick = data.xpath('//td[@data_stat="draft_pick"]//text()')
player = data.xpath('//td[@data_stat="player"]//text()')
position = data.xpath('//td[@data_stat="pos"]//text()')
age= data.xpath('//td[@data_stat="age"]//text()')
games_played = data.xpath('//td[@data_stat="g"]//text()')
cmp = data.xpath('//td[@data_stat="pass_cmp"]//text()')
att = data.xpath('//td[@data_stat="pass_att"]//text()')
college = data.xpath('//td[@data_stat="college_id"]//text()')

data = list(zip(pick,player,position,age,games_played,cmp,att,college))

df = pd.DataFrame(data)
df

There are two errors showing on two separate files I tried:

  1. <module 'pandas' from 'C:\Users\anaconda3\lib\site-packages\pandas\init.py'>
  2. AttributeError: 'list' object has no attribute 'xpath'

The code is not giving me the list of data I wanted from the webpage. Can anyone help me out with this? Thank you in advance.

Upvotes: 0

Views: 30

Answers (1)

RJ Adriaansen
RJ Adriaansen

Reputation: 9639

You can load html tables directly into a dataframe using read_html:

import pandas as pd

df = pd.read_html('http://www.pro-football-reference.com/years/2017/draft.htm')[0]
df.columns = df.columns.droplevel(0) # drop top header row
df = df[df['Rnd'].ne('Rnd')] # remove mid-table header rows 

Output:

|    |   Rnd |   Pick | Tm   | Player            | Pos   |   Age |   To |   AP1 |   PB |   St |   CarAV |   DrAV |   G |   Cmp |   Att |   Yds |   TD |   Int |   Att |   Yds |   TD |   Rec |   Yds |   TD |   Solo |   Int |    Sk | College/Univ   | Unnamed: 28_level_1   |
|---:|------:|-------:|:-----|:------------------|:------|------:|-----:|------:|-----:|-----:|--------:|-------:|----:|------:|------:|------:|-----:|------:|------:|------:|-----:|------:|------:|-----:|-------:|------:|------:|:---------------|:----------------------|
|  0 |     1 |      1 | CLE  | Myles Garrett     | DE    |    21 | 2020 |     1 |    2 |    4 |      35 |     35 |  51 |     0 |     0 |     0 |    0 |     0 |     0 |     0 |    0 |     0 |     0 |    0 |    107 |   nan |  42.5 | Texas A&M      | College Stats         |
|  1 |     1 |      2 | CHI  | Mitchell Trubisky | QB    |    23 | 2020 |     0 |    1 |    3 |      33 |     33 |  51 |  1010 |  1577 | 10609 |   64 |    37 |   190 |  1057 |    8 |     0 |     0 |    0 |    nan |   nan | nan   | North Carolina | College Stats         |
|  2 |     1 |      3 | SFO  | Solomon Thomas    | DE    |    22 | 2020 |     0 |    0 |    2 |      15 |     15 |  48 |     0 |     0 |     0 |    0 |     0 |     0 |     0 |    0 |     0 |     0 |    0 |     73 |   nan |   6   | Stanford       | College Stats         |
|  3 |     1 |      4 | JAX  | Leonard Fournette | RB    |    22 | 2020 |     0 |    0 |    3 |      25 |     20 |  49 |     0 |     0 |     0 |    0 |     0 |   763 |  2998 |   23 |   170 |  1242 |    2 |    nan |   nan | nan   | LSU            | College Stats         |
|  4 |     1 |      5 | TEN  | Corey Davis       | WR    |    22 | 2020 |     0 |    0 |    4 |      25 |     25 |  56 |     0 |     0 |     0 |    0 |     0 |     6 |    55 |    0 |   207 |  2851 |   11 |    nan |   nan | nan   | West. Michigan | College Stats         |

Upvotes: 1

Related Questions