NSK
NSK

Reputation: 105

Operation with pandas Series

Suppose two Series a = [1,2,3,4,5], b = [60,7,80,9,100]

I would like to create a new variable, which will be calculated as follows: C = a/b if b >10 else a/b +1

I can do this by using list comprenhension by the following way:

C = [a[i] \b[i] if b[i] > 10 else a[i] \b[i] +1 for i in range(len(b))]

My Question is the following:

Is there another way (e.g using lambda,map,apply etc.) so as to avoid the for loop? (Series a,b,c can also be part of a pd.Dataframe)

Upvotes: 1

Views: 158

Answers (1)

jezrael
jezrael

Reputation: 862661

First idea is divide values and add 1 by condition - convert mask to integers 1 and 0:

c  = a/b + (b <=10).astype(int)
#alternative
#c  = a/b + (~(b > 10)).astype(int)

Or add array created by numpy.where:

c  = a/b + np.where(b > 10, 0, 1)

If want divide 2 times it is also possible (should be a bit slowier in large data)

c  = pd.Series(np.where(b >10, a/b, a/b +1), index=a.index)

print (c)
0    0.016667
1    1.285714
2    0.037500
3    1.444444
4    0.050000
dtype: float64

Setup:

a = pd.Series([1,2,3,4,5])
b = pd.Series([60,7,80,9,100])

Performance:

np.random.seed(2019)

a = pd.Series(np.random.randint(1,100, size=100000))
b = pd.Series(np.random.randint(1,100, size=100000))

In [322]: %timeit [a[i] /b[i] if b[i] > 10 else a[i] /b[i] +1 for i in range(len(b))]
3.08 s ± 84.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [323]: %timeit a/b + (b <=10).astype(int)
1.71 ms ± 44.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [324]: %timeit a/b + np.where(b > 10, 0, 1)
1.67 ms ± 66.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [325]: %timeit np.where(b >10, a/b, a/b +1)
2.7 ms ± 13.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [326]: %timeit pd.Series(np.where(b >10, a/b, a/b +1), index=a.index)
2.74 ms ± 21.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Upvotes: 1

Related Questions