Create dataframe of all rows AFTER varying amount of header data Python Pandas

Question

I have dataframes with a varying amount of header data. I need to remove the header data, (ie. create a new dataframe containing only the data that comes after this header)

I have used the following code to find the row where the header data ends.

df = xlsx_file.parse('ActualSheet',header= None)    
value_list = ['var1','var2']
df_Header = df[df[0].isin(value_list) & (df[1].isin(value_list))]

The above code works and creates a dataframe of the final row of header data.

I am having trouble creating a new dataframe from the original data that only includes the rows AFTER this "df_Header" row.

Any help is appreciated, I know the answer is already out there but I could not find it.

MaxU - stand with Ukraine · Accepted Answer

IIUC you can do it this way:

df = df[df_Header.index.max():]

or

df = df[~(df[0].isin(value_list) & (df[1].isin(value_list)))]

PS you may also want to make use of header and / or skiprows parameters of the read_excel() function

Create dataframe of all rows AFTER varying amount of header data Python Pandas

Answers (1)

Related Questions