Jordan
Jordan

Reputation: 1495

My numpy select statement is giving a `shape mismatch` error

I'm trying to develop a column of data based on two conditions using a numpy select statement. The conditions are in a list and have been tested by themselves to ensure they pull the data as expected. I'm getting the following error when actually applying the select statement. Here is the error being thrown:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-151-6994e3f46efb> in <module>
      8 replace = [600, 675, 710, 745, 999]
      9 
---> 10 train_df3_dummies['credit_C5_score'] = np.select(condition, replace, default = 1)

C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py in select(condlist, choicelist, default)
    698     # as the shape is needed for the result. Doing it separately optimizes
    699     # for example when all choices are scalars.
--> 700     condlist = np.broadcast_arrays(*condlist)
    701     choicelist = np.broadcast_arrays(*choicelist)
    702 

C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\stride_tricks.py in broadcast_arrays(*args, **kwargs)
    257     args = [np.array(_m, copy=False, subok=subok) for _m in args]
    258 
--> 259     shape = _broadcast_shape(*args)
    260 
    261     if all(array.shape == shape for array in args):

C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\stride_tricks.py in _broadcast_shape(*args)
    191     # use the old-iterator because np.nditer does not handle size 0 arrays
    192     # consistently
--> 193     b = np.broadcast(*args[:32])
    194     # unfortunately, it cannot handle 32 or more arguments directly
    195     for pos in range(32, len(args), 31):

ValueError: shape mismatch: objects cannot be broadcast to a single shape

Here is the code being used:

condition = [(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 600)])
             ,(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 675)])
             ,(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 710)])
             ,(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 745)])
             ,(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 999)])]


replace = [600, 675, 710, 745, 999]

train_df3_dummies['credit_C5_score'] = np.select(condition, replace, default = 1)

I've seen this error applying to geometric problems on here but not numpy.select. Any ideas?

Upvotes: 1

Views: 1950

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150785

It's mismatch most likely because of this:

train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1 ....

That is, each of your conditions has different length. Get rid of the loc, that is:

condition = [(train_df3_dummies['credit_model_C5'] ==1) & (train_df3_dummies['credit_number']==600),... 
            ]

You can also do:

s = ((train_df3_dummies['credit_model_C5'] == 1) &
     train_df3_dummies['credit_number'].isin(replace)
    )

train_df3_dummies['credit_C5_score'] = np.where(s, 

train_df3_dummies['credit_number'], 1)

Upvotes: 1

Related Questions