How to select column based on condition in row?

Question

I have a dataframe df, and want to select only the columns that have a specific value "xyz" in row [0].

I know if I were trying to select rows with a specific value "xyz" in column [0] I would just do:

df[df[0] == 'xyz']

Now I try:

df[df.loc[0] == 'xyz']

and I get error:

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

Why won't the same operation work for selecting columns with a condition? And what's the easiest solution to get just the columns I want?

Valdi_Bo · Accepted Answer

Assume that df contains:

    Aa   Bb   Cc   Dd
0  xyz  abc  xyz  efg
1   e1   e2   e3   e4
2   f1   f2   f3   f4

To generate a bool vector (actually a Series) checking whether consecutive cells in row 0 == 'xyz', you can run:

df.iloc[0] == 'xyz'

The result is:

Aa     True
Bb    False
Cc     True
Dd    False
Name: 0, dtype: bool

To get the names of columns of interest, run:

df.columns[df.iloc[0] == 'xyz']

The result is:

Index(['Aa', 'Cc'], dtype='object')

So to get your expected result, retrieve df[...] with the above expression between brackets:

df[df.columns[df.iloc[0] == 'xyz']]

The result is:

    Aa   Cc
0  xyz  xyz
1   e1   e3
2   f1   f3

A comment concerning your code samples

Your first code sample - df[0] == 'xyz' failed because df[0] attempts to retrieve a column with name 0. Since your DataFrame doesn't contain such column, a KeyError: 0 exception is raised.

The second code sample - df[df.loc[0] == 'xyz'] is a better approach. Note that df.loc[0] == 'xyz' is just what I proposed.

The reason of failure is that you now attempt something like boolean indexing on columns, but in this case it does not work. Between brackets you can pass a list of column names, not a list of "boolean indicators".

How to select column based on condition in row?

Answers (2)

A comment concerning your code samples

Related Questions