Reputation: 3047
I have the following:
x = pd.DataFrame({'a':[1,5,5], 'b':[7,0,7]})
And for every row, i want to get the index of the first column that met the condition that it's value is greater than some value, let's say greater than 4.
In this example, the answer is 1, (correspond to the index of the value 7 in the first row) and 0 (correspond to the index of the value 5 in the second row), and 1(correspond to the index of the value 5 in the third row). Which means the answer is [1,0,0].
I tried it with apply method:
def get_values_from_row(row, th=0.9):
"""Get a list of column names that meet some condition that their values are larger than a threshold.
Args:
row(pd.DataFrame): a row.
th(float): the threshold.
Returns:
string. contains the columns that it's value met the condition.
"""
return row[row > th].index.tolist()[0]
It works, but i have a large data set, and it's quite slow. What is a better alternative.
Upvotes: 3
Views: 5236
Reputation: 862641
I think you need first_valid_index
with get_loc
:
print (x[x > 4])
a b
0 NaN 7.0
1 5.0 NaN
2 7.0 5.0
print (x[x > 4].apply(lambda x: x.index.get_loc(x.first_valid_index()), axis=1))
0 1
1 0
2 0
dtype: int64
Upvotes: 5