Steven C. Howell
Steven C. Howell

Reputation: 18624

Select column from DataFrame rows using a list of column labels

I have a list of values with the column index I want for each row of a pandas DataFrame. How do I map this list of column labels to each row of the DataFrame?

If I simply index the DataFrame using the list, the entire list gets applied to every row, like this.

In [10]: df = pd.DataFrame(np.random.randn(5,2), columns=list('AB'))

In [11]: df
Out[11]: 
          A         B
0 -0.082240 -2.182937
1  0.380396  0.084844
2  0.432390  1.519970
3 -0.493662  0.600178
4  0.274230  0.132885

In[12]: selection = list('ABBAA')

In[13]: selection
Out[13]: ['A', 'B', 'B', 'A', 'A']

In[14]: df[selection]
Out[14]:
          A         B         B         A         A
0 -0.082240 -2.182937 -2.182937 -0.082240 -0.082240
1  0.380396  0.084844  0.084844  0.380396  0.380396  
2  0.432390  1.519970  1.519970  0.432390  0.432390
3 -0.493662  0.600178  0.600178 -0.493662 -0.493662
4  0.274230  0.132885  0.132885  0.274230  0.274230

Each element in the selection list indicates the column to select from the corresponding row in the DataFrame. In this example, I want column A from the first row, B from the second and third, then A from the fourth and fifth. It works out that this is the diagonal of the above result. My actual DataFrame is much larger and I don't think it makes sense to build the above result just to select out the diagonal.

I can certainly get at this by looping over the rows but I expect Pandas has a built-in way to do this. I am looking for the method to get the following result.

In[15]: df <do something> selection
Out[15]:
0 -0.082240
1  0.084844
2  1.519970
3 -0.493662
4  0.274230

Upvotes: 3

Views: 1258

Answers (4)

Vaishali
Vaishali

Reputation: 38415

You are slicing the dataframe twice if you first select columns based on selection and then get the diagonal values. You can instead use lookup which returns an array of the values corresponding to each (row, col) pair.

df.lookup(df.index, selection)

array([-0.08224 ,  0.084844,  1.51997 , -0.493662,  0.27423 ])

If you want the data in form of Pandas series,

pd.Series(df.lookup(df.index, selection))

0   -0.082240
1    0.084844
2    1.519970
3   -0.493662
4    0.274230

Upvotes: 5

BENY
BENY

Reputation: 323326

I can only think of numpy method

pd.Series(df.values[df.index,df.columns.get_indexer(selection)])
Out[563]: 
0   -0.082240
1    0.084844
2    1.519970
3   -0.493662
4    0.274230
dtype: float64

Upvotes: 3

meW
meW

Reputation: 3967

IIUC, try this

df[selection].values.diagonal()

If, you need in the form of Series as you remarked, do -

pd.Series(df[selection].values.diagonal())

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153500

Use reindex:

df.reindex(selection, axis=1)

Output:

          A         B         B         A         A
0  0.065447 -1.890299 -1.890299  0.065447  0.065447
1  0.389780  0.301049  0.301049  0.389780  0.389780
2  0.484159 -1.311432 -1.311432  0.484159  0.484159
3 -0.209745 -2.233050 -2.233050 -0.209745 -0.209745
4 -0.093495 -1.527827 -1.527827 -0.093495 -0.093495

Then, use np.eye and mask

df.reindex(selection, axis=1).mask(np.eye(5) == 0).stack()

Output:

0  A    0.065447
1  B    0.301049
2  B   -1.311432
3  A   -0.209745
4  A   -0.093495
dtype: float64

Upvotes: 3

Related Questions