Reputation: 381
I am trying to create a new variable basis on certain condition. In the given example, i want to create a new condition such that if the recency is greater than the mean of recency and the frequency is greater than 3 times its standard deviation, i will assign 2 else 0. Below are the codes and its error:
import pandas as pd
import numpy as np
# intialise data of lists.
data = {'cust':["a1", "a2", "a3", "a4", "a5", "a6", "a7", "a8", "a9", "a10", "a11", "a12", "a13", "a14", "a15", "a16", "a17", "a18", "a19", "a20", "a21", "a22", "a23", "a24", "a25", "a26", "a27", "a28", "a29", "a30", "a31", "a32", "a33", "a34", "a35", "a36", "a37", "a38", "a39", "a40", "a41", "a42", "a43", "a44", "a45", "a46", "a47", "a48", "a49", "a50", "a51"],
'recency':[3, 7, 9, 9, 6, 8, 3, 9, 6, 5, 8, 6, 2, 8, 3, 3, 2, 7, 3, 1, 7, 6, 10, 6, 2, 8, 6, 10, 2, 7, 9, 1, 1, 3, 6, 4, 6, 4, 6, 6, 7, 3, 7, 9, 6, 4, 7, 3, 1, 9, 3],
'frequency':[15, 9, 13, 9, 19, 1, 11, 20, 20, 15, 15, 18, 1, 9, 20, 14, 11, 11, 4, 15, 1, 8, 17, 19, 13, 20, 1, 11, 3, 8, 2, 4, 15, 5, 12, 15, 20, 6, 19, 2, 6, 12, 6, 6, 4, 7, 2, 3, 20, 13, 11],
'monetary':[8854, 5614, 2687, 3553, 1801, 1076, 9724, 7778, 8382, 4391, 6766, 9905, 3181, 4170, 7544, 2997, 3025, 9358, 6015, 9919, 5132, 3598, 8779, 4420, 8931, 1492, 5491, 4186, 4720, 2568, 2530, 4618, 4109, 9384, 3000, 9766, 9524, 1027, 6315, 9806, 3442, 7256, 2432, 2429, 7696, 4527, 1802, 6606, 3018, 6295, 2985]}
# Create DataFrame
df = pd.DataFrame(data)
df['cluster']=np.where(df['recency']>df['recency'].mean() & df['frequency']>df['frequency'].mean()+
df['frequency'].std() ,2,0)
TypeError: Cannot perform 'rand_' with a dtyped [int64] array and scalar of type [bool]
Upvotes: 0
Views: 39
Reputation: 862441
I think there missing ()
for conditions because priority of operators:
df['cluster'] = np.where((df['recency']>df['recency'].mean()) &
(df['frequency']>df['frequency'].mean()+df['frequency'].std()),2,0)
Upvotes: 1