Reputation: 1869
which is the fastest way to achieve the following:
I'm using a Pandas Dataframe (NxN) and i want to iterate over each row and each element to check if the element is greater than the rows mean. If it is greater i want to change the element value to 1.
I calculate the mean value using :
mean_value = df.ix[elementid].mean(axis=0)
but iterating over each element and checking if it is >= mean_value with a nested loop is really slow.
Upvotes: 3
Views: 781
Reputation: 862481
You can first count mean
by rows, then comparing with ge
and where mask
add 1
:
print df
a b c
0 0 1 2
1 0 1 2
2 1 1 2
3 1 0 1
4 1 1 2
5 0 0 1
mean_value = df.mean(axis=1)
print mean_value
0 1.000000
1 1.000000
2 1.333333
3 0.666667
4 1.333333
5 0.333333
mask = df.ge(mean_value, axis=0)
print mask
a b c
0 False True True
1 False True True
2 False False True
3 True False True
4 False False True
5 False False True
print df.mask(mask, 1)
a b c
0 0 1 1
1 0 1 1
2 1 1 1
3 1 0 1
4 1 1 1
5 0 0 1
Upvotes: 6