Reputation: 7929
Say I have the following Excel file:
A B C
0 - - -
1 Start - -
2 3 2 4
3 7 8 4
4 11 2 17
I want to read the file in a dataframe making sure that I start to read it below the row where the Start
value is.
Attention: the Start
value is not always located in the same row, so if I were to use:
import pandas as pd
xls = pd.ExcelFile('C:\Users\MyFolder\MyFile.xlsx')
df = xls.parse('Sheet1', skiprows=4, index_col=None)
this would fail as skiprows
needs to be fixed. Is there any workaround to make sure that xls.parse
finds the string value instead of the row number?
Upvotes: 9
Views: 53358
Reputation: 31
You could use pd.read_excel('C:\Users\MyFolder\MyFile.xlsx', sheet_name='Sheet1')
as it ignores empty excel cells.
Your DataFrame should then look like this:
A B C
0 Start NaN NaN
1 3 2 4
2 7 8 4
3 11 2 17
Then drop the first row by using
df.drop([0])
to get
A B C
0 3 2 4
1 7 8 4
2 11 2 17
Upvotes: 3
Reputation: 735
If you know the specific rows you are interested in, you can skip from the top using skiprow
and then parse only the row (or rows) you want using nrows
- see pandas.read_excel
df = pd.read_excel('myfile.xlsx', 'Sheet1', skiprows=2, nrows=3,)
Upvotes: 11
Reputation: 382
df = pd.read_excel('your/path/filename')
This answer helps in finding the location of 'start' in the df
for row in range(df.shape[0]):
for col in range(df.shape[1]):
if df.iat[row,col] == 'start':
row_start = row
break
after having row_start you can use subframe of pandas
df_required = df.loc[row_start:]
And if you don't need the row containing 'start', just u increment row_start by 1
df_required = df.loc[row_start+1:]
Upvotes: 12