brandog
brandog

Reputation: 1527

Create dataframe of all rows AFTER varying amount of header data Python Pandas

I have dataframes with a varying amount of header data. I need to remove the header data, (ie. create a new dataframe containing only the data that comes after this header)

I have used the following code to find the row where the header data ends.

df = xlsx_file.parse('ActualSheet',header= None)    
value_list = ['var1','var2']
df_Header = df[df[0].isin(value_list) & (df[1].isin(value_list))] 

The above code works and creates a dataframe of the final row of header data.

I am having trouble creating a new dataframe from the original data that only includes the rows AFTER this "df_Header" row.

Any help is appreciated, I know the answer is already out there but I could not find it.

Upvotes: 0

Views: 51

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210882

IIUC you can do it this way:

df = df[df_Header.index.max():]

or

df = df[~(df[0].isin(value_list) & (df[1].isin(value_list)))] 

PS you may also want to make use of header and / or skiprows parameters of the read_excel() function

Upvotes: 1

Related Questions