Reputation: 1495
I'm trying to develop a column of data based on two conditions using a numpy select
statement. The conditions are in a list and have been tested by themselves to ensure they pull the data as expected. I'm getting the following error when actually applying the select statement. Here is the error being thrown:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-151-6994e3f46efb> in <module>
8 replace = [600, 675, 710, 745, 999]
9
---> 10 train_df3_dummies['credit_C5_score'] = np.select(condition, replace, default = 1)
C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\function_base.py in select(condlist, choicelist, default)
698 # as the shape is needed for the result. Doing it separately optimizes
699 # for example when all choices are scalars.
--> 700 condlist = np.broadcast_arrays(*condlist)
701 choicelist = np.broadcast_arrays(*choicelist)
702
C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\stride_tricks.py in broadcast_arrays(*args, **kwargs)
257 args = [np.array(_m, copy=False, subok=subok) for _m in args]
258
--> 259 shape = _broadcast_shape(*args)
260
261 if all(array.shape == shape for array in args):
C:\ProgramData\Anaconda3\lib\site-packages\numpy\lib\stride_tricks.py in _broadcast_shape(*args)
191 # use the old-iterator because np.nditer does not handle size 0 arrays
192 # consistently
--> 193 b = np.broadcast(*args[:32])
194 # unfortunately, it cannot handle 32 or more arguments directly
195 for pos in range(32, len(args), 31):
ValueError: shape mismatch: objects cannot be broadcast to a single shape
Here is the code being used:
condition = [(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 600)])
,(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 675)])
,(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 710)])
,(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 745)])
,(train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1) & (train_df3_dummies['credit_number'] == 999)])]
replace = [600, 675, 710, 745, 999]
train_df3_dummies['credit_C5_score'] = np.select(condition, replace, default = 1)
I've seen this error applying to geometric problems on here but not numpy.select
. Any ideas?
Upvotes: 1
Views: 1950
Reputation: 150785
It's mismatch most likely because of this:
train_df3_dummies.loc[(train_df3_dummies['credit_model_C5'] == 1 ....
That is, each of your conditions has different length. Get rid of the loc
, that is:
condition = [(train_df3_dummies['credit_model_C5'] ==1) & (train_df3_dummies['credit_number']==600),...
]
You can also do:
s = ((train_df3_dummies['credit_model_C5'] == 1) &
train_df3_dummies['credit_number'].isin(replace)
)
train_df3_dummies['credit_C5_score'] = np.where(s,
train_df3_dummies['credit_number'], 1)
Upvotes: 1