BLimitless
BLimitless

Reputation: 2575

Vectorization & ValueError, but not from "or" and "and" operators

This question and answer chain do a great job explaining how to resolve ValueErrors that come up when utilizing conditionals, e.g. "or" instead of |, and "and" instead of &. But I don't see anything in that chain that resolves the problem of a "ValueError, Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" for vectorization when trying to use vectorization with a function that was written to take a single number as an input.

Specifically, Map and Apply work fine in this case, but Vectorization still throws the ValueError.

Code below, and can someone share how to fix this so vectorization can be used without modifying the function (or without modifying the function too much)? Thank you!


Code:

# import numpy and pandas, create dataframe.
import numpy as np
import pandas as pd
x = range(1000)
df = pd.DataFrame(data = x, columns = ['Number']) 

# define simple function to return True or False if number passed in is prime
def is_prime(num):
    if num < 2:
        return False
    elif num == 2: 
        return True
    else: 
        for i in range(2,num):
            if num % i == 0:
                return False
    return True

# Call various ways of applying the function to the data frame
df['map prime'] = list(map(is_prime, df['Number']))
df['apply prime'] = df['Number'].apply(is_prime)

# look at dataframe
in: df.head()
out: Number     map prime   apply prime
0   0   264     False       False
1   1   907     False       False
2   2   354     True        True
3   3   583     True        True
4   4   895     False       False

# now try to vectorizing
in: df['optimize prime'] = is_prime(df['Number'])
out: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 1

Views: 124

Answers (2)

hpaulj
hpaulj

Reputation: 231395

In [133]: timeit df['map prime'] = list(map(is_prime, df['Number']))
6.25 ms ± 65.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [134]: timeit df['apply prime'] = df['Number'].apply(is_prime)
6.18 ms ± 27.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

np.vectorize has an important disclaimer - it is not a performance tool

In [137]: timeit df['optimize prime'] = vis_prime(df['Number'])
5.83 ms ± 7.28 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

It isn't much faster than pandas own apply iterator.

And with a simple list comprehension:

In [140]: timeit [is_prime(num) for num in x]
5.65 ms ± 10 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

To get real "vectorized" speedup, we have to write code that works with a whole array, or in this case, a Series.

There's an inherent serial quality to this function. In particular

        if num % i == 0:
            return False

short circuits, without iterating through the whole range(2,num). Some prime sieves also build on previous num

Upvotes: 2

rodrigocfaria
rodrigocfaria

Reputation: 440

You could try out numpy's vectorize:

vis_prime = np.vectorize(is_prime)

df['optimize prime'] = vis_prime(df['Number'])

That gives you:

   Number  map prime  apply prime  optimize prime
0       0      False        False           False
1       1      False        False           False
2       2       True         True            True
3       3       True         True            True
4       4      False        False           False

Upvotes: 2

Related Questions