Reputation: 18624
I have a list of values with the column index I want for each row of a pandas DataFrame. How do I map this list of column labels to each row of the DataFrame?
If I simply index the DataFrame using the list, the entire list gets applied to every row, like this.
In [10]: df = pd.DataFrame(np.random.randn(5,2), columns=list('AB'))
In [11]: df
Out[11]:
A B
0 -0.082240 -2.182937
1 0.380396 0.084844
2 0.432390 1.519970
3 -0.493662 0.600178
4 0.274230 0.132885
In[12]: selection = list('ABBAA')
In[13]: selection
Out[13]: ['A', 'B', 'B', 'A', 'A']
In[14]: df[selection]
Out[14]:
A B B A A
0 -0.082240 -2.182937 -2.182937 -0.082240 -0.082240
1 0.380396 0.084844 0.084844 0.380396 0.380396
2 0.432390 1.519970 1.519970 0.432390 0.432390
3 -0.493662 0.600178 0.600178 -0.493662 -0.493662
4 0.274230 0.132885 0.132885 0.274230 0.274230
Each element in the selection list indicates the column to select from the corresponding row in the DataFrame. In this example, I want column A
from the first row, B
from the second and third, then A
from the fourth and fifth. It works out that this is the diagonal of the above result. My actual DataFrame is much larger and I don't think it makes sense to build the above result just to select out the diagonal.
I can certainly get at this by looping over the rows but I expect Pandas has a built-in way to do this. I am looking for the method to get the following result.
In[15]: df <do something> selection
Out[15]:
0 -0.082240
1 0.084844
2 1.519970
3 -0.493662
4 0.274230
Upvotes: 3
Views: 1258
Reputation: 38415
You are slicing the dataframe twice if you first select columns based on selection and then get the diagonal values. You can instead use lookup which returns an array of the values corresponding to each (row, col) pair.
df.lookup(df.index, selection)
array([-0.08224 , 0.084844, 1.51997 , -0.493662, 0.27423 ])
If you want the data in form of Pandas series,
pd.Series(df.lookup(df.index, selection))
0 -0.082240
1 0.084844
2 1.519970
3 -0.493662
4 0.274230
Upvotes: 5
Reputation: 323326
I can only think of numpy
method
pd.Series(df.values[df.index,df.columns.get_indexer(selection)])
Out[563]:
0 -0.082240
1 0.084844
2 1.519970
3 -0.493662
4 0.274230
dtype: float64
Upvotes: 3
Reputation: 3967
IIUC, try this
df[selection].values.diagonal()
If, you need in the form of Series
as you remarked, do -
pd.Series(df[selection].values.diagonal())
Upvotes: 1
Reputation: 153500
Use reindex
:
df.reindex(selection, axis=1)
Output:
A B B A A
0 0.065447 -1.890299 -1.890299 0.065447 0.065447
1 0.389780 0.301049 0.301049 0.389780 0.389780
2 0.484159 -1.311432 -1.311432 0.484159 0.484159
3 -0.209745 -2.233050 -2.233050 -0.209745 -0.209745
4 -0.093495 -1.527827 -1.527827 -0.093495 -0.093495
Then, use np.eye
and mask
df.reindex(selection, axis=1).mask(np.eye(5) == 0).stack()
Output:
0 A 0.065447
1 B 0.301049
2 B -1.311432
3 A -0.209745
4 A -0.093495
dtype: float64
Upvotes: 3