Reputation: 947
I have a pandas series which contains numbers between 0 and 1. If the number is < 0.5 I need to multiply it with 10, otherwise multiply it with 20.
I can do something like this to multiply everything by 20.
outcome = 20 * my_series
And I can iterate over the entire series and do it one by one like this:
for i, v in my_series.iteritems():
if v >= 0.5:
mul = 20
else:
mul = 10
outcome.append(mul * my_series[i])
However the second way is much slower and I was wondering whether there's a better way to handle this case.
Upvotes: 0
Views: 102
Reputation: 2830
I'm new to pandas, so this might not be the most efficient answer, but I'll throw it out there because it seems to work:
pandas.Series.where(my_series*10, cond=my_series<0.5, other=my_series*20)
Playing around with different versions, I also came up with the following, but I'm assuming that the above is more efficient since it's built in.
In place version:
my_series[my_series>=0.5] *= 20
my_series[my_series<0.5] *= 10
in line version:
(my_series < 0.5)*(my_series*10) + (my_series >=0.5)*(my_series*20)
Update
Just out of curiosity, I tried a quick timeit test of the above, and was somewhat surprised by the results:
>>> setup = """
... import random, pandas
... random.seed=('skdfjaiswe')
... my_series = pandas.Series([random.random() for idx in range(1000)])
... """
>>> print min(timeit.Timer("pandas.Series.where(my_series*10, cond=my_series<0.5, other=my_series*20)", setup=setup).repeat(7, 1000))
0.758988142014
>>> print min(timeit.Timer("my_series[my_series>=0.5] *= 20; my_series[my_series<0.5] *= 10", setup=setup).repeat(7, 1000))
9.13403320312
>>> print min(timeit.Timer("(my_series < 0.5)*(my_series*10) + (my_series >=0.5)*(my_series*20)", setup=setup).repeat(7, 1000))
0.612030029297
Unless I did something wrong here (anyone?), it appears that at least for this example the self-vectorized version is a little faster.
Upvotes: 1