Rahul rajan
Rahul rajan

Reputation: 1266

Getting an error with Numpy where condition

I am trying create a new column with using np.where condition of other columns in the database.

My code

  df5['RiskSubType']=np.where(new_df['Snow_Risk']==1,(( ' Heavy Snow forecasted at  ' +df5.LOCATION.mask(new_df.LOCATION=='',df5.LOCATION_CITY))),
np.where(df5['Wind_Risk']==1,( ' Heavy Wind forecasted at  ' +df5.LOCATION.mask(df5.LOCATION=='',df5.LOCATION_CITY)),
np.where(df5['Precip_Risk']==1,( ' Heavy Rain forecasted at  ' +df5.LOCATION.mask(df5.LOCATION=='',df5.LOCATION_CITY)),"No Risk Identified")))

Error

ValueError: operands could not be broadcast together with shapes

How to fix this or this alternative way do this.

Upvotes: 1

Views: 789

Answers (2)

LeoE
LeoE

Reputation: 2083

So first of all, your design/code style is really hard to read, you should think about simplifying it. Your problems occurs due to the fact, that you are trying to smash strings and arrays in the np.where function. The documentation says:

numpy.where(condition[, x, y])

Return elements chosen from x or y depending on condition.

Parameters:

condition : array_like, bool

Where True, yield x, otherwise yield y.
x, y : array_like
Values from which to choose. x, y and condition need to be broadcastable to some shape.

Returns:

out : ndarray

An array with elements from x where condition is True, and elements from y elsewhere.

As you can see x and y need to be broadcastable to some shape. Looking at the documentation of broadcastable:

6.4. Broadcasting

Another powerful feature of Numpy is broadcasting. Broadcasting takes place when you perform operations between arrays of different shapes. For instance

>>> a = np.array([
    [0, 1],
    [2, 3],
    [4, 5],
    ])
>>> b = np.array([10, 100])
>>> a * b
array([[  0, 100],
       [ 20, 300],
       [ 40, 500]])

The shapes of a and b don’t match. In order to proceed, Numpy will stretch b into a second dimension, as if it were stacked three times upon itself. The operation then takes place element-wise.

One of the rules of broadcasting is that only dimensions of size 1 can be stretched (if an array only has one dimension, all other dimensions are considered for broadcasting purposes to have size 1). In the example above b is 1D, and has shape (2,). For broadcasting with a, which has two dimensions, Numpy adds another dimension of size 1 to b. b now has shape (1, 2). This new dimension can now be stretched three times so that b’s shape matches a’s shape of (3, 2).

The other rule is that dimensions are compared from the last to the first. Any dimensions that do not match must be stretched to become equally sized. However, according to the previous rule, only dimensions of size 1 can stretch. This means that some shapes cannot broadcast and Numpy will give you an error:

>>> c = np.array([
    [0, 1, 2],
    [3, 4, 5],
    ])
>>> b = np.array([10, 100])
>>> c * b
ValueError: operands could not be broadcast together with shapes (2,3) (2,)

What happens here is that Numpy, again, adds a dimension to b, making it of shape (1, 2). The sizes of the last dimensions of b and c (2 and 3, respectively) are then compared and found to differ. Since none of these dimensions is of size 1 (therefore, unstretchable) Numpy gives up and produces an error.

The solution to multiplying c and b above is to specifically tell Numpy that it must add that extra dimension as the second dimension of b. This is done by using None to index that second dimension. The shape of b then becomes (2, 1), which is compatible for broadcasting with c:

>>> c = np.array([
    [0, 1, 2],
    [3, 4, 5],
    ])
>>> b = np.array([10, 100])
>>> c * b[:, None]
array([[  0,  10,  20],
       [300, 400, 500]])

A good visual description of these rules, together with some advanced broadcasting applications can be found in this tutorial of Numpy broadcasting rules.

So the problem is, that you are trying to broadcast an (n,)(first where) to a scalar(first string) to a (m,)(second where) to a scalar(second string) to a (k,)(third where) and so on. Since n != m != k can and will be the case and the dimensions for stretching do not match the broadcasting does not work.

Upvotes: 1

Joe
Joe

Reputation: 7121

Please provide something like this:

d = {'LOCATION': ['?', '?'],
     'LOCATION_CITY': ['?', '?'],
     'Wind_Risk': [1, 0],
     'Precip_Risk': [1, 0],
     'Snow_Risk': [1, 0]}

df = pd.DataFrame(data=d)

Upvotes: 1

Related Questions