Reputation: 620
I have a dataframe:
Energy Supply Energy Supply per Capita % Renewable
Country
Afghanistan 3.210000e+08 10 78.669280
Albania 1.020000e+08 35 100.000000
British Virgin Islands 2.000000e+06 85 0.000000
...
Aruba 1.200000e+07 120 14.870690 ...
Excludes the overseas territories. NaN NaN NaN
Data exclude Hong Kong and Macao Special Admini... NaN NaN NaN
Data on kerosene-type jet fuel include aviation... NaN NaN NaN
For confidentiality reasons, data on coal and c... NaN NaN NaN
Data exclude Greenland and the Danish Faroes. NaN NaN NaN
I had used df = pd.read_excel(filelink, skiprows=16)
to cut unwanted information at the very beginning of the file but how can I get rid of the "noize"-information at the end of df?
I had tried to pass a list to skiprows but it messed the results up.
Upvotes: 1
Views: 3172
Reputation: 863056
It seems you need parameter skip_footer = 5
in read_excel
:
skip_footer : int, default 0
Rows at the end to skip (0-indexed)
Sample:
df = pd.read_excel('myfile.xlsx', skip_footer = 5)
print (df)
Country Energy Supply Energy Supply per Capita \
0 Afghanistan 321000000.0 10
1 Albania 102000000.0 35
2 British Virgin Islands 2000000.0 85
3 Aruba 12000000.0 120
% Renewable
0 78.66928
1 100.00000
2 0.00000
3 14.87069
Another solution is remove all rows where all NaN
in some columns with dropna
:
df = pd.read_excel('myfile.xlsx')
cols = ['Energy Supply','Energy Supply per Capita','% Renewable']
df = df.dropna(subset=cols, how='all')
print (df)
Country Energy Supply Energy Supply per Capita \
0 Afghanistan 321000000.0 10.0
1 Albania 102000000.0 35.0
2 British Virgin Islands 2000000.0 85.0
3 Aruba 12000000.0 120.0
% Renewable
0 78.66928
1 100.00000
2 0.00000
3 14.87069
Upvotes: 5