Reputation: 323
Suppose I have the following dataframe:
df = pd.DataFrame({'A':[1,2,3,400], 'B':[100,2,3,4]})
And I want to find the locations (by index and column) of every element larger than 50, i.e. a correct output would be:
[(3,'A'), (0,'B')]
What would be the most pythonic way of doing this?
Upvotes: 1
Views: 1209
Reputation: 25672
It might be worth considering whether you actually need a MultiIndex
here, where a DataFrame
will work just as well. In addition, with a DataFrame
you have a whole world of fast operations at your fingertips which is not the case with MultiIndex
:
In [44]: df = pd.DataFrame({'A':[1,2,3,400], 'B':[100,2,3,4]})
In [45]: df = df.reset_index()
In [46]: df
Out[46]:
index A B
0 0 1 100
1 1 2 2
2 2 3 3
3 3 400 4
In [47]: molten = melt(df, var_name='column', id_vars='index')
In [48]: molten
Out[48]:
index column value
0 0 A 1
1 1 A 2
2 2 A 3
3 3 A 400
4 0 B 100
5 1 B 2
6 2 B 3
7 3 B 4
In [49]: molten[molten.value > 50]
Out[49]:
index column value
3 3 A 400
4 0 B 100
With this method, you get to keep all of your labeling and the values whose indices you're interested in.
As a side note, when I first discovered MultiIndex
es I thought they were the greatest thing since sliced bread. After using pandas
on a regular basis for various tasks, I've found that they are often a hindrance since they behave sort of like a DataFrame
and sort of like an Index
.
Upvotes: 1
Reputation: 3507
Almost the same as above, but without creating any intermediate variable:
>>> df[df>50].stack().index.tolist()
[(0L, 'B'), (3L, 'A')]
Upvotes: 3
Reputation: 375695
You could use stack here and then use a boolean mask (for those values over 50):
In [11]: s = df.stack()
In [12]: s
Out[12]:
0 A 1
B 100
1 A 2
B 2
2 A 3
B 3
3 A 400
B 4
dtype: int64
In [13]: s[s > 50]
Out[13]:
0 B 100
3 A 400
dtype: int64
In [14]: s[s > 50].index
Out[14]:
MultiIndex
[(0, u'B'), (3, u'A')]
If you require this as a list:
In [15]: s[s > 50].index.tolist()
Out[15]: [(0, 'B'), (3, 'A')]
Upvotes: 3