Reputation: 77
I have a dataframe that looks like this:
np.random.seed(21)
df = pd.DataFrame(np.random.randn(8, 4), columns=['A', 'B1', 'B2', 'B3'])
df['current_State'] = [df['B1'][0], df['B1'][1], df['B2'][2], df['B2'][3], df['B3'][4], df['B3'][5], df['B1'][6], df['B2'][7]]
df
I need to create a new column that contains the name of the column where the value of 'current_State' is the same, this is the desired output:
I tried many combinations of apply and lambda functions but without success. Any help is very welcome!
Upvotes: 4
Views: 181
Reputation: 71689
You can compare the current_State
column with all the remaining columns to create a boolean mask, then use idxmax
along axis=1
on this mask
to get the name of the column where the value in the given row equal to corresponding value in current_State
:
c = 'current_State'
df['new_column'] = df.drop(c, 1).eq(df[c], axis=0).idxmax(1)
In case if there is a possibility that there are no matching values we can instead use:
c = 'current_State'
m = df.drop(c, 1).eq(df[c], axis=0)
df['new_column'] = m.idxmax(1).mask(~m.any(1))
>>> df
A B1 B2 B3 current_State new_column
0 -0.051964 -0.111196 1.041797 -1.256739 -0.111196 B1
1 0.745388 -1.711054 -0.205864 -0.234571 -1.711054 B1
2 1.128144 -0.012626 -0.613200 1.373688 -0.613200 B2
3 1.610992 -0.689228 0.691924 -0.448116 0.691924 B2
4 0.162342 0.257229 -1.275456 0.064004 0.064004 B3
5 -1.061857 -0.989368 -0.457723 -1.984182 -1.984182 B3
6 -1.476442 0.231803 0.644159 0.852123 0.231803 B1
7 -0.464019 0.697177 1.567882 1.178556 1.567882 B2
Upvotes: 2