ffspider
ffspider

Reputation: 21

Can't drop columns in table

Getting back to a project I put aside a few months ago, and I was reviewing my code and I got stuck when importing a dataframe, but for some kind of reason, I can't drop certain columns here, and I just need 4 of them.

I'm a beginner btw.

So I'm trying to get data from this table:

import pandas as pd

import requests

url = 'https://www.hockey-reference.com/leagues/NHL_2022_goalies.html'
html = requests.get(url).content
df_list = pd.read_html(url)
df = df_list[0]
df.droplevel(level=0, axis='columns').filter(['Rk', 'Player', 'SV%', 'QS%'])
print(df)

But I get the whole table.

  Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Goalie Stats                      Unnamed: 10_level_0 Unnamed: 11_level_0 Unnamed: 12_level_0 Goalie Stats    Unnamed: 15_level_0 Goalie Stats                           Scoring           
                  Rk             Player                Age                 Tm           GP  GS   W   L T/O   GA                  SA                  SV                 SV%          GAA SO                 GPS          MIN  QS   QS% RBS GA%-   GSAA       G  A PTS PIM
0                  1         Jake Allen                 31                MTL           35  35   9  20   4  107                1123                1016                .905         3.30  2                 6.0         1948  18  .514   8  102  -2.55       0  0   0   2
1                  2      Hugo Alnefelt                 20                TBL            1   0   0   0   0    3                  10                   7                .700         9.00  0                -0.2           20   0   NaN   0  NaN    NaN       0  0   0   0
2                  3  Frederik Andersen                 32                CAR           52  51  35  14   3  111                1431                1320                .922         2.17  4                10.2         3071  30  .588   5   83  22.09       0  4   4   0
3                  4     Craig Anderson                 40                BUF           31  31  17  12   2   97                 945                 848                .897         3.12  0                 4.3         1867  14  .452   7  110  -9.11       0  1   1   0
4                  5     Justus Annunen                 21                COL            2   1   1   0   1    7                  51                  44                .863         4.34  0                 0.1           97   0  .000   1  NaN    NaN       0  1   1   0

What am I doing wrong here?

Upvotes: 1

Views: 152

Answers (2)

Abdel
Abdel

Reputation: 159

import pandas as pd
import requests

url = 'https://www.hockey-reference.com/leagues/NHL_2022_goalies.html'
html = requests.get(url).content
df_list = pd.read_html(url)
df = df_list[0]
df = df.droplevel(level=0, axis='columns').filter(['Rk', 'Player', 'SV%', 'QS%'])
Print(df)

Which gives the following result:

    Rk              Player   SV%   QS%
0    1          Jake Allen  .905  .800
1    2   Frederik Andersen  .944  .750
2    3      Craig Anderson  .933  .667
3    4    Jonathan Bernier  .911  .500
4    5   Jordan Binnington  .919  .750
..  ..                 ...   ...   ...
63  61      Karel Vejmelka  .900  .333
64  62       Daniel Vladar  .880  .000
65  63     Scott Wedgewood  .852  .000

Upvotes: 1

Doğu Can Elçi
Doğu Can Elçi

Reputation: 23

It is not so efficient for time complexity but i saved dataframe as .csv:

import pandas as pd
import requests
url = 'https://www.hockey-reference.com/leagues/NHL_2022_goalies.html'
html = requests.get(url).content
df_list = pd.read_html(url)
df = df_list[0]
df.to_csv('df1.csv')

and then i changed the first line of csv file manually like this: enter image description here

then it looks like: enter image description here

Upvotes: 1

Related Questions