Reputation: 2575
This question and answer chain do a great job explaining how to resolve ValueErrors that come up when utilizing conditionals, e.g. "or" instead of |, and "and" instead of &. But I don't see anything in that chain that resolves the problem of a "ValueError, Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" for vectorization when trying to use vectorization with a function that was written to take a single number as an input.
Specifically, Map and Apply work fine in this case, but Vectorization still throws the ValueError.
Code below, and can someone share how to fix this so vectorization can be used without modifying the function (or without modifying the function too much)? Thank you!
Code:
# import numpy and pandas, create dataframe.
import numpy as np
import pandas as pd
x = range(1000)
df = pd.DataFrame(data = x, columns = ['Number'])
# define simple function to return True or False if number passed in is prime
def is_prime(num):
if num < 2:
return False
elif num == 2:
return True
else:
for i in range(2,num):
if num % i == 0:
return False
return True
# Call various ways of applying the function to the data frame
df['map prime'] = list(map(is_prime, df['Number']))
df['apply prime'] = df['Number'].apply(is_prime)
# look at dataframe
in: df.head()
out: Number map prime apply prime
0 0 264 False False
1 1 907 False False
2 2 354 True True
3 3 583 True True
4 4 895 False False
# now try to vectorizing
in: df['optimize prime'] = is_prime(df['Number'])
out: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Upvotes: 1
Views: 124
Reputation: 231395
In [133]: timeit df['map prime'] = list(map(is_prime, df['Number']))
6.25 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [134]: timeit df['apply prime'] = df['Number'].apply(is_prime)
6.18 ms ± 27.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
np.vectorize
has an important disclaimer - it is not a performance tool
In [137]: timeit df['optimize prime'] = vis_prime(df['Number'])
5.83 ms ± 7.28 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
It isn't much faster than pandas
own apply
iterator.
And with a simple list comprehension:
In [140]: timeit [is_prime(num) for num in x]
5.65 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
To get real "vectorized" speedup, we have to write code that works with a whole array, or in this case, a Series.
There's an inherent serial quality to this function. In particular
if num % i == 0:
return False
short circuits, without iterating through the whole range(2,num)
. Some prime sieves also build on previous num
Upvotes: 2
Reputation: 440
You could try out numpy's vectorize
:
vis_prime = np.vectorize(is_prime)
df['optimize prime'] = vis_prime(df['Number'])
That gives you:
Number map prime apply prime optimize prime
0 0 False False False
1 1 False False False
2 2 True True True
3 3 True True True
4 4 False False False
Upvotes: 2