Reputation: 21
Getting back to a project I put aside a few months ago, and I was reviewing my code and I got stuck when importing a dataframe, but for some kind of reason, I can't drop certain columns here, and I just need 4 of them.
I'm a beginner btw.
So I'm trying to get data from this table:
import pandas as pd
import requests
url = 'https://www.hockey-reference.com/leagues/NHL_2022_goalies.html'
html = requests.get(url).content
df_list = pd.read_html(url)
df = df_list[0]
df.droplevel(level=0, axis='columns').filter(['Rk', 'Player', 'SV%', 'QS%'])
print(df)
But I get the whole table.
Unnamed: 0_level_0 Unnamed: 1_level_0 Unnamed: 2_level_0 Unnamed: 3_level_0 Goalie Stats Unnamed: 10_level_0 Unnamed: 11_level_0 Unnamed: 12_level_0 Goalie Stats Unnamed: 15_level_0 Goalie Stats Scoring
Rk Player Age Tm GP GS W L T/O GA SA SV SV% GAA SO GPS MIN QS QS% RBS GA%- GSAA G A PTS PIM
0 1 Jake Allen 31 MTL 35 35 9 20 4 107 1123 1016 .905 3.30 2 6.0 1948 18 .514 8 102 -2.55 0 0 0 2
1 2 Hugo Alnefelt 20 TBL 1 0 0 0 0 3 10 7 .700 9.00 0 -0.2 20 0 NaN 0 NaN NaN 0 0 0 0
2 3 Frederik Andersen 32 CAR 52 51 35 14 3 111 1431 1320 .922 2.17 4 10.2 3071 30 .588 5 83 22.09 0 4 4 0
3 4 Craig Anderson 40 BUF 31 31 17 12 2 97 945 848 .897 3.12 0 4.3 1867 14 .452 7 110 -9.11 0 1 1 0
4 5 Justus Annunen 21 COL 2 1 1 0 1 7 51 44 .863 4.34 0 0.1 97 0 .000 1 NaN NaN 0 1 1 0
What am I doing wrong here?
Upvotes: 1
Views: 152
Reputation: 159
pandas.DataFrame.droplevel
or pandas.DataFrame.filter
are inplace updates, as such, the result must be assigned back to a variable, such as df =
.import pandas as pd
import requests
url = 'https://www.hockey-reference.com/leagues/NHL_2022_goalies.html'
html = requests.get(url).content
df_list = pd.read_html(url)
df = df_list[0]
df = df.droplevel(level=0, axis='columns').filter(['Rk', 'Player', 'SV%', 'QS%'])
Print(df)
Which gives the following result:
Rk Player SV% QS%
0 1 Jake Allen .905 .800
1 2 Frederik Andersen .944 .750
2 3 Craig Anderson .933 .667
3 4 Jonathan Bernier .911 .500
4 5 Jordan Binnington .919 .750
.. .. ... ... ...
63 61 Karel Vejmelka .900 .333
64 62 Daniel Vladar .880 .000
65 63 Scott Wedgewood .852 .000
Upvotes: 1
Reputation: 23
It is not so efficient for time complexity but i saved dataframe as .csv:
import pandas as pd
import requests
url = 'https://www.hockey-reference.com/leagues/NHL_2022_goalies.html'
html = requests.get(url).content
df_list = pd.read_html(url)
df = df_list[0]
df.to_csv('df1.csv')
and then i changed the first line of csv file manually like this:
Upvotes: 1