Sid
Sid

Reputation: 123

Identifying column with the first value in the row based on a condition

I have dataframe where I want to identify the column for each row which has the first value corresponding to a condition. In this case below I want to create a new column which identifies the first value in each row which is less than or equal to 1 and gives me the corresponding column name.

df = pd.DataFrame({'A': [1.5,2,4,0.5], 'B' : [2,1,3,0.25], 'C': [3,1,1,1], 'D': [2,2,3,1]})
df
    A    B      C   D
0   1.5  2.00   3   2
1   2.0  1.00   1   2
2   4.0  3.00   1   3
3   0.5  0.25   1   1

I can create a mask to check for the condition.

temp = df<=1
temp
    A       B       C       D
0   False   False   False   False
1   False   True    True    False
2   False   False   True    False
3   True    True    True    True

I can then use the following to determine the columns.

df['New_col'] = temp.idxmax(axis = 1)
df
    A    B      C   D   New_col
0   1.5  2.00   3   2   A
1   2.0  1.00   1   2   B
2   4.0  3.00   1   3   C
3   0.5  0.25   1   1   A

The code identifies the columns in New_col correctly except for row 0 as all of the values in row 0 are greater than 1. How can I get NaN instead of A for row 0 in the New_col?

Below is the output desired.

    A    B      C   D   New_col
0   1.5  2.00   3   2   NaN
1   2.0  1.00   1   2   B
2   4.0  3.00   1   3   C
3   0.5  0.25   1   1   A

Thanks.

Upvotes: 1

Views: 38

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

Use any(1) to check if there is a True on the rows, and where to mask:

df['New_col'] = temp.idxmax(axis = 1).where(temp.any(1))

Output:

     A     B  C  D New_col
0  1.5  2.00  3  2     NaN
1  2.0  1.00  1  2       B
2  4.0  3.00  1  3       C
3  0.5  0.25  1  1       A

Upvotes: 2

Related Questions