How to start reading in an excel file at a certain row based on a condition in Pandas

Question

I read in excel files that are normally formatted like this below:

colA colB
   0    0
   1    1

and I can just write something like df = pd.read_excel(filename, skiprows=0)

which skips the column headers and ingests the data. However sometimes my data comes in as

some random text in the cells above
colA colB
   0    0
   1    1

where I would need to either delete that extra row manually then shift everything up so that the first row is made up of the column headers. Is there an elegant way to start the excel read at whatever row number colA is found so we skip any unnecessary entries or text above the colA and colB headers?

Toby Petty · Accepted Answer

Assuming you know the first column name (i.e. colA in your example), and that this value will be present somewhere in the first column of data:

if df.columns[0] != "colA":  # Check first if column name is incorrect.
    # Get the first column of data:
    first_col = df[df.columns[0]]
    # Identify the row index where the value equals the column name:
    header_row_index = first_col.loc[first_col == "colA"].index[0]
    # Grab the column names:
    column_names = df.loc[header_row_index]
    # Reset the df to start below the new header row, and rename the columns:
    df = df.loc[header_row_index+1:, :]
    df.columns = column_names

How to start reading in an excel file at a certain row based on a condition in Pandas

Answers (2)

Related Questions