Reputation:
Customer_ID Gender First_Date First_region First_state First_city \
0 129609144 M 20130130 West Gujarat Surat
1 129627580 M 20130129 North Delhi Delhi
2 130363481 M 20130221 West Gujarat Surat
3 49817480 M 20130222 West Maharashtra Pimpri-Chinchwad
4 126343829 F 20130301 North Delhi Delhi
Recent_Date Last_region Last_state Last_city Customer_Value \
0 20130216 West Gujarat Surat 2032.0
1 20130129 North Delhi Delhi 1709.0
2 20130221 West Gujarat Surat 523.0
3 20130222 West Maharashtra Pimpri-Chinchwad 5132.0
4 20130301 North Delhi Delhi 1008.0
Buy_Times Points_Earned Points_Redeemed
0 2 200.0 0.0
1 1 100.0 0.0
2 1 10.0 0.0
3 1 170.0 0.0
4 1 60.0 0.0
I'm trying to create a new column name 'customer value segment' but I want to assign the values in this column based on values of the column 'Customer_Value'.
So,
I've tried this method:
df['customer value segment'] = np.where(df['Customer_Value'] > 25000, 'High Value Segment', np.where(10000 > df['Customer_Value'] > 25000, 'Medium Value Segment', np.where(df['Customer_Value'] <= 10000, 'Low Value Segment', 'None')))
But, no luck. It throws me this following error:
ValueError Traceback (most recent call last)
<ipython-input-48-fee1062f32ba> in <module>
----> 1 df['customer value segment'] = np.where(df['Customer_Value'] > 25000, 'High Value Segment', np.where(10000 > df['Customer_Value'] > 25000, 'Medium Value Segment', np.where(df['Customer_Value'] <= 10000, 'Low Value Segment', 'None')))
~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1476 raise ValueError("The truth value of a {0} is ambiguous. "
1477 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1478 .format(self.__class__.__name__))
1479
1480 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
How should I approach this now?
Note - Just in case if you want to read the actual dataset, this is how I did it:
df = pd.read_csv('Customers.csv', encoding='unicode_escape')
Upvotes: 1
Views: 103
Reputation: 482
Try the following list comprehension:
df["customer value segment"] = ["High Value Segment" if x>25000 else "Medium Value Segement" if x>10000 else "Low Value Segment" for x in df["Customer_Value"]]
Upvotes: 0
Reputation: 582
Hmm
np says it wants an array-like object, did you try to operate with arrays instead of df? Also, the second argument in the where function should be an array, not a string. I am just guessing here that the string brings the trouble. Try to put it in brackets.
But I actually would just iterate over the data frame and check with if or switch cases.
newCol = []
for ind in df.index:
if df['Customer_Value'][ind] > 25000:
newCol.append('High Value Segment')
else if 10000 > df['Customer_Value'][ind] > 25000:
newCol.append('Medium Value Segment')
else:
newCol.append('Low Value Segment')
and then just append the array. I wrote it here so the blanks might not work so good and you have to fix these in your editor. Let me know if it worked.
Upvotes: 0
Reputation: 7214
This should work:
df.loc[df['Customer_Value'] > 25000, 'customer value segment'] = 'High Value Segment'
df.loc[(df['Customer_Value'] >= 10000) & (df['Customer_Value'] <= 25000) , 'customer value segment'] = 'Medium Value Segemnt '
df.loc[df['Customer_Value'] < 10000, 'customer value segment'] = 'Low Value Segment '
Upvotes: 1