dmenesesg
dmenesesg

Reputation: 77

How to compare a value of a single column over multiple columns in the same row using pandas?

I have a dataframe that looks like this:

np.random.seed(21) 
df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B1', 'B2', 'B3'])
df['current_State'] = [df['B1'][0], df['B1'][1], df['B2'][2], df['B2'][3], df['B3'][4], df['B3'][5], df['B1'][6], df['B2'][7]]
df 

Click here to see the df

I need to create a new column that contains the name of the column where the value of 'current_State' is the same, this is the desired output:

desired output

I tried many combinations of apply and lambda functions but without success. Any help is very welcome!

Upvotes: 4

Views: 181

Answers (1)

Shubham Sharma
Shubham Sharma

Reputation: 71689

You can compare the current_State column with all the remaining columns to create a boolean mask, then use idxmax along axis=1 on this mask to get the name of the column where the value in the given row equal to corresponding value in current_State:

c = 'current_State'
df['new_column'] = df.drop(c, 1).eq(df[c], axis=0).idxmax(1)

In case if there is a possibility that there are no matching values we can instead use:

c = 'current_State'
m = df.drop(c, 1).eq(df[c], axis=0)
df['new_column'] = m.idxmax(1).mask(~m.any(1))

>>> df

          A        B1        B2        B3  current_State new_column
0 -0.051964 -0.111196  1.041797 -1.256739      -0.111196         B1
1  0.745388 -1.711054 -0.205864 -0.234571      -1.711054         B1
2  1.128144 -0.012626 -0.613200  1.373688      -0.613200         B2
3  1.610992 -0.689228  0.691924 -0.448116       0.691924         B2
4  0.162342  0.257229 -1.275456  0.064004       0.064004         B3
5 -1.061857 -0.989368 -0.457723 -1.984182      -1.984182         B3
6 -1.476442  0.231803  0.644159  0.852123       0.231803         B1
7 -0.464019  0.697177  1.567882  1.178556       1.567882         B2

Upvotes: 2

Related Questions