Reputation: 1527
I have dataframes with a varying amount of header data. I need to remove the header data, (ie. create a new dataframe containing only the data that comes after this header)
I have used the following code to find the row where the header data ends.
df = xlsx_file.parse('ActualSheet',header= None)
value_list = ['var1','var2']
df_Header = df[df[0].isin(value_list) & (df[1].isin(value_list))]
The above code works and creates a dataframe of the final row of header data.
I am having trouble creating a new dataframe from the original data that only includes the rows AFTER this "df_Header" row.
Any help is appreciated, I know the answer is already out there but I could not find it.
Upvotes: 0
Views: 51
Reputation: 210882
IIUC you can do it this way:
df = df[df_Header.index.max():]
or
df = df[~(df[0].isin(value_list) & (df[1].isin(value_list)))]
PS you may also want to make use of header
and / or skiprows
parameters of the read_excel() function
Upvotes: 1