Reputation: 23
df=pd.read_excel(filename)
What attributes should I add to exclude header and footer?
Upvotes: 2
Views: 17773
Reputation: 561
Head over here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
you will find
pandas.read_excel(io, sheet_name=0, header=0, names=None, index_col=None, usecols=None, squeeze=False, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skiprows=None, nrows=None, na_values=None, keep_default_na=True, verbose=False, parse_dates=False, date_parser=None, thousands=None, comment=None, skipfooter=0, convert_float=True, mangle_dupe_cols=True, **kwds)
Out of which ones which are useful to you are:
pandas.read_excel(skiprows=None, skipfooter=0)
you can specify the value in integer to skiprows=1
to skip header
and skipfooter=1
to skip footer
you can add as many rows are you want to skip
Hope it helps
Upvotes: 8
Reputation: 151
This is how I achieved it:
energy = pd.read_excel('Energy Indicators.xls', index_col=None, header=None, footer=None)
energy = energy[18:245].reset_index()
I used this for data cleaning in my assignment where I was required to remove header and footer.
So I first imported raw data from excel using
energy = pd.read_excel('Energy Indicators.xls', index_col=None, header=None, footer=None)
Now since the row from which actual required data started is 18 and row where footer started is 245 I used this
energy = energy[18:245].reset_index()
I used reset_index()
because after removing rows my index was messed up so to reset it.
You might want to remove that extra column named 'index' which is created by reset_index()
Upvotes: 5
Reputation: 1
you should create a sun dataframe from previous data frame which doesn't contain the header and footer attribute.
for example,
if you excel file contains 500 rows, in which first three rows are header and last 10 rows are footers then you can do as below:
df = df[4:491]
Upvotes: 0