Xianghui Huang
Xianghui Huang

Reputation: 49

Dataframe greater than zero and less than zero

I have a big dataframe like this with millions rows. I want to do something to apply to this dataframe quickly.

df value
   10
   -1
   20
   ...
   -3
   -4
   -50
   12

I want to know the most efficient way to determine if the value is greater than 0, the value will * 2. If the value is less than 0, the value will *3. The output will be the dataframe like this

df value
   20
   -3
   40
   ...
   -9
   -12
   -150
   24

my script is

dff = df.value
for i in range(len(dff)):
   if dff[i] > 0:
         dff[i] = dff[i] * 2
   elif dff[i] < 0:
         dff[i] = dff[i] * 3

Upvotes: 0

Views: 1862

Answers (1)

tozCSS
tozCSS

Reputation: 6114

Let s be:

s = pd.Series(np.random.randint(-10,11,10**6))

Best solution among the alternatives:

y = np.where(s > 0, s * 2, s * 3)

Timed:

CPU times: user 11.3 ms, sys: 2.21 ms, total: 13.5 ms
Wall time: 11.8 ms

Your solution:

%%time
for i in range(len(s)):
    if s[i] > 0:
        s[i] = s[i] * 2
    elif s[i] < 0:
        s[i] = s[i] * 3

Timed:

CPU times: user 17.7 s, sys: 51.3 ms, total: 17.8 s
Wall time: 17.9 s

An alternative:

%%time
y = s.map(lambda x: x*2 if x>0 else x*3)

Timed:

CPU times: user 308 ms, sys: 37.5 ms, total: 345 ms
Wall time: 371 ms

Another alternative:

%%time
mask = s>0
y = s.where(mask, s * 2).where(~mask, s * 3)

Timed:

CPU times: user 31 ms, sys: 7.43 ms, total: 38.4 ms
Wall time: 37.2 ms

Upvotes: 1

Related Questions