TheSuperbard
TheSuperbard

Reputation: 75

Select columns in a DataFrame conditional on row

I am attempting to generate a dataframe (or series) based on another dataframe, selecting a different column from the first frame dependent on the row using another series. In the below simplified example, I want the frame1 values from 'a' for the first three rows, and 'b for the final two (the picked_values series).

frame1=pd.DataFrame(np.random.randn(10).reshape(5,2),index=range(5),columns=['a','b'])
picked_values=pd.Series(['a','a','a','b','b'])

Frame1

    a           b
0   0.283519    1.462209
1   -0.352342   1.254098
2   0.731701    0.236017
3   0.022217    -1.469342
4   0.386000    -0.706614

Trying to get to the series:

0   0.283519
1   -0.352342
2   0.731701
3   -1.469342
4   -0.706614

I was hoping values[picked_values] would work, but this ends up with five columns.

In the real-life example, picked_values is a lot larger and calculated.

Thank you for your time.

Upvotes: 6

Views: 116

Answers (2)

yatu
yatu

Reputation: 88226

Here's a NumPy based approach using integer indexing and Series.searchsorted:

frame1.values[frame1.index, frame1.columns.searchsorted(picked_values.values)]
# array([0.22095278, 0.86200616, 1.88047197, 0.49816937, 0.10962954])

Upvotes: 3

anky
anky

Reputation: 75080

Use df.lookup

pd.Series(frame1.lookup(picked_values.index,picked_values))

0    0.283519
1   -0.352342
2    0.731701
3   -1.469342
4   -0.706614
dtype: float64

Upvotes: 6

Related Questions