Reputation: 1227
I have the following DataFrame df
:
col1 col2 col3
50 dd 3
2 r NaN
5 d 4
a e 5
I need to calculate the mean value for selected columns cols
. And then I should check if any of the values in selected rows deviate from the median value by more than 20%.
I am not sure how to tackle mixed values in a single row to make these calculations.
def test_row(x, threshold):
if x.dtype == int or x.dtype == float:
return x > threshold
columns = ["col1", "col3"]
for col in columns:
threshold = df[col].median()*(20/100)
check = df.apply(lambda x: test_row(x[col], threshold), axis=1)
print(check.any())
However it obviously fails because if x.dtype == int or x.dtype == float
does not work.
Upvotes: 0
Views: 291
Reputation: 76
Your function could be:
def test_row(x, threshold):
if isinstance(x,(int,float)) and x:
return x > threshold
The second logic in function is just for check if x contains something, if its empty it will return False.
Upvotes: 1