user2383521
user2383521

Reputation: 323

Identify elements of a dataframe satisfying a condition

Suppose I have the following dataframe:

df = pd.DataFrame({'A':[1,2,3,400], 'B':[100,2,3,4]})

And I want to find the locations (by index and column) of every element larger than 50, i.e. a correct output would be:

[(3,'A'), (0,'B')]

What would be the most pythonic way of doing this?

Upvotes: 1

Views: 1209

Answers (3)

Phillip Cloud
Phillip Cloud

Reputation: 25672

It might be worth considering whether you actually need a MultiIndex here, where a DataFrame will work just as well. In addition, with a DataFrame you have a whole world of fast operations at your fingertips which is not the case with MultiIndex:

In [44]: df = pd.DataFrame({'A':[1,2,3,400], 'B':[100,2,3,4]})

In [45]: df = df.reset_index()

In [46]: df
Out[46]:
   index    A    B
0      0    1  100
1      1    2    2
2      2    3    3
3      3  400    4

In [47]: molten = melt(df, var_name='column', id_vars='index')

In [48]: molten
Out[48]:
   index column  value
0      0      A      1
1      1      A      2
2      2      A      3
3      3      A    400
4      0      B    100
5      1      B      2
6      2      B      3
7      3      B      4

In [49]: molten[molten.value > 50]
Out[49]:
   index column  value
3      3      A    400
4      0      B    100

With this method, you get to keep all of your labeling and the values whose indices you're interested in.

As a side note, when I first discovered MultiIndexes I thought they were the greatest thing since sliced bread. After using pandas on a regular basis for various tasks, I've found that they are often a hindrance since they behave sort of like a DataFrame and sort of like an Index.

Upvotes: 1

Nic
Nic

Reputation: 3507

Almost the same as above, but without creating any intermediate variable:

>>> df[df>50].stack().index.tolist()
[(0L, 'B'), (3L, 'A')]

Upvotes: 3

Andy Hayden
Andy Hayden

Reputation: 375695

You could use stack here and then use a boolean mask (for those values over 50):

In [11]: s = df.stack()

In [12]: s
Out[12]:
0  A      1
   B    100
1  A      2
   B      2
2  A      3
   B      3
3  A    400
   B      4
dtype: int64

In [13]: s[s > 50]
Out[13]:
0  B    100
3  A    400
dtype: int64

In [14]: s[s > 50].index
Out[14]:
MultiIndex
[(0, u'B'), (3, u'A')]

If you require this as a list:

In [15]: s[s > 50].index.tolist()
Out[15]: [(0, 'B'), (3, 'A')]

Upvotes: 3

Related Questions