Reputation: 59
I'm trying to filter out columns without a particular string as the second row value. In this case, let's say I only want the columns with Apple
in them to display, and the Kiwi
columns to not be displayed.
It works no problem if the string value is in the first row instead, however, it does not work at all in the second row. I know a change is required in the fourth line of code bolded below, but I'm stuck. Any ideas?
import pandas as pd
file = 'K:\blah blah\blah\\sampledata.xlsx'
inp = pd.read_excel(file, header=1)
result = inp.iloc[:, inp.columns.isin(['Apple'])]
Upvotes: 0
Views: 1130
Reputation: 11
You can read the whole DataFrame
and drop Kiwi
from the columns:
df = pd.DataFrame({"Apple": [1, 2, 3], "Kiwi": [4, 5, 6]})
df.drop(columns="Kiwi")
Otherwise, you can specify the columns you don't want in a function passed to usecols
:
pd.read_excel(file, usecols=lambda col: "Kiwi" not in col)
This method should be faster https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html
Upvotes: 1
Reputation: 863611
I think first and second rows of data are headers, so use header=[0,1]
for MultiIndex
in read_excel
:
inp = pd.read_excel(file, header=[0,1])
And then select by DataFrame.xs
:
df = inp.xs('Bananas', axis=1, level=1)
Upvotes: 0