J-H
J-H

Reputation: 1869

Python Pandas Dataframe update values efficently

which is the fastest way to achieve the following:

I'm using a Pandas Dataframe (NxN) and i want to iterate over each row and each element to check if the element is greater than the rows mean. If it is greater i want to change the element value to 1.

I calculate the mean value using :

mean_value = df.ix[elementid].mean(axis=0)

but iterating over each element and checking if it is >= mean_value with a nested loop is really slow.

Upvotes: 3

Views: 781

Answers (1)

jezrael
jezrael

Reputation: 862481

You can first count mean by rows, then comparing with ge and where mask add 1:

print df
   a  b  c
0  0  1  2
1  0  1  2
2  1  1  2
3  1  0  1
4  1  1  2
5  0  0  1

mean_value = df.mean(axis=1)
print mean_value
0    1.000000
1    1.000000
2    1.333333
3    0.666667
4    1.333333
5    0.333333

mask = df.ge(mean_value, axis=0)
print mask
       a      b     c
0  False   True  True
1  False   True  True
2  False  False  True
3   True  False  True
4  False  False  True
5  False  False  True
print df.mask(mask, 1)
   a  b  c
0  0  1  1
1  0  1  1
2  1  1  1
3  1  0  1
4  1  1  1
5  0  0  1

Upvotes: 6

Related Questions