clueless clouder
clueless clouder

Reputation: 59

Using .iloc and .isin to filter columns based on the value of the second row (not first) in Python

I'm trying to filter out columns without a particular string as the second row value. In this case, let's say I only want the columns with Apple in them to display, and the Kiwi columns to not be displayed.

It works no problem if the string value is in the first row instead, however, it does not work at all in the second row. I know a change is required in the fourth line of code bolded below, but I'm stuck. Any ideas?

import pandas as pd

file = 'K:\blah blah\blah\\sampledata.xlsx'

inp = pd.read_excel(file, header=1)

result = inp.iloc[:, inp.columns.isin(['Apple'])]

Upvotes: 0

Views: 1130

Answers (2)

Giulio Beseghi
Giulio Beseghi

Reputation: 11

You can read the whole DataFrame and drop Kiwi from the columns:

df = pd.DataFrame({"Apple": [1, 2, 3], "Kiwi": [4, 5, 6]})
df.drop(columns="Kiwi")

Otherwise, you can specify the columns you don't want in a function passed to usecols:

pd.read_excel(file, usecols=lambda col: "Kiwi" not in col)

This method should be faster https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html

Upvotes: 1

jezrael
jezrael

Reputation: 863611

I think first and second rows of data are headers, so use header=[0,1] for MultiIndex in read_excel:

inp = pd.read_excel(file, header=[0,1])

And then select by DataFrame.xs:

df = inp.xs('Bananas', axis=1, level=1)

Upvotes: 0

Related Questions