Reputation: 49
I have a big dataframe like this with millions rows. I want to do something to apply to this dataframe quickly.
df value
10
-1
20
...
-3
-4
-50
12
I want to know the most efficient way to determine if the value is greater than 0, the value will * 2. If the value is less than 0, the value will *3. The output will be the dataframe like this
df value
20
-3
40
...
-9
-12
-150
24
my script is
dff = df.value
for i in range(len(dff)):
if dff[i] > 0:
dff[i] = dff[i] * 2
elif dff[i] < 0:
dff[i] = dff[i] * 3
Upvotes: 0
Views: 1862
Reputation: 6114
Let s
be:
s = pd.Series(np.random.randint(-10,11,10**6))
Best solution among the alternatives:
y = np.where(s > 0, s * 2, s * 3)
Timed:
CPU times: user 11.3 ms, sys: 2.21 ms, total: 13.5 ms
Wall time: 11.8 ms
Your solution:
%%time
for i in range(len(s)):
if s[i] > 0:
s[i] = s[i] * 2
elif s[i] < 0:
s[i] = s[i] * 3
Timed:
CPU times: user 17.7 s, sys: 51.3 ms, total: 17.8 s
Wall time: 17.9 s
An alternative:
%%time
y = s.map(lambda x: x*2 if x>0 else x*3)
Timed:
CPU times: user 308 ms, sys: 37.5 ms, total: 345 ms
Wall time: 371 ms
Another alternative:
%%time
mask = s>0
y = s.where(mask, s * 2).where(~mask, s * 3)
Timed:
CPU times: user 31 ms, sys: 7.43 ms, total: 38.4 ms
Wall time: 37.2 ms
Upvotes: 1