Reputation: 123
I have dataframe where I want to identify the column for each row which has the first value corresponding to a condition. In this case below I want to create a new column which identifies the first value in each row which is less than or equal to 1 and gives me the corresponding column name.
df = pd.DataFrame({'A': [1.5,2,4,0.5], 'B' : [2,1,3,0.25], 'C': [3,1,1,1], 'D': [2,2,3,1]})
df
A B C D
0 1.5 2.00 3 2
1 2.0 1.00 1 2
2 4.0 3.00 1 3
3 0.5 0.25 1 1
I can create a mask to check for the condition.
temp = df<=1
temp
A B C D
0 False False False False
1 False True True False
2 False False True False
3 True True True True
I can then use the following to determine the columns.
df['New_col'] = temp.idxmax(axis = 1)
df
A B C D New_col
0 1.5 2.00 3 2 A
1 2.0 1.00 1 2 B
2 4.0 3.00 1 3 C
3 0.5 0.25 1 1 A
The code identifies the columns in New_col correctly except for row 0 as all of the values in row 0 are greater than 1. How can I get NaN instead of A for row 0 in the New_col?
Below is the output desired.
A B C D New_col
0 1.5 2.00 3 2 NaN
1 2.0 1.00 1 2 B
2 4.0 3.00 1 3 C
3 0.5 0.25 1 1 A
Thanks.
Upvotes: 1
Views: 38
Reputation: 150735
Use any(1)
to check if there is a True
on the rows, and where
to mask:
df['New_col'] = temp.idxmax(axis = 1).where(temp.any(1))
Output:
A B C D New_col
0 1.5 2.00 3 2 NaN
1 2.0 1.00 1 2 B
2 4.0 3.00 1 3 C
3 0.5 0.25 1 1 A
Upvotes: 2