user10169126
user10169126

Reputation:

How do you conditionally assign the values to a column?

 Customer_ID     Gender  First_Date First_region  First_state        First_city  \
0    129609144      M    20130130         West      Gujarat             Surat   
1    129627580      M    20130129        North        Delhi             Delhi   
2    130363481      M    20130221         West      Gujarat             Surat   
3     49817480      M    20130222         West  Maharashtra  Pimpri-Chinchwad   
4    126343829      F    20130301        North        Delhi             Delhi   

   Recent_Date    Last_region   Last_state         Last_city  Customer_Value  \
0     20130216        West      Gujarat             Surat          2032.0   
1     20130129       North        Delhi             Delhi          1709.0   
2     20130221        West      Gujarat             Surat           523.0   
3     20130222        West  Maharashtra  Pimpri-Chinchwad          5132.0   
4     20130301       North        Delhi             Delhi          1008.0   

   Buy_Times  Points_Earned  Points_Redeemed  
0          2          200.0              0.0  
1          1          100.0              0.0  
2          1           10.0              0.0  
3          1          170.0              0.0  
4          1           60.0              0.0    

I'm trying to create a new column name 'customer value segment' but I want to assign the values in this column based on values of the column 'Customer_Value'.

So,

I've tried this method:

df['customer value segment'] = np.where(df['Customer_Value'] > 25000, 'High Value Segment', np.where(10000 > df['Customer_Value'] > 25000, 'Medium Value Segment', np.where(df['Customer_Value'] <= 10000, 'Low Value Segment', 'None')))  

But, no luck. It throws me this following error:

 ValueError                                Traceback (most recent call last)
<ipython-input-48-fee1062f32ba> in <module>
----> 1 df['customer value segment'] = np.where(df['Customer_Value'] > 25000, 'High Value Segment', np.where(10000 > df['Customer_Value'] > 25000, 'Medium Value Segment', np.where(df['Customer_Value'] <= 10000, 'Low Value Segment', 'None')))

~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1476         raise ValueError("The truth value of a {0} is ambiguous. "
   1477                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1478                          .format(self.__class__.__name__))
   1479 
   1480     __bool__ = __nonzero__

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().  

How should I approach this now?

Note - Just in case if you want to read the actual dataset, this is how I did it:

df = pd.read_csv('Customers.csv', encoding='unicode_escape')

Upvotes: 1

Views: 103

Answers (3)

Oskar_U
Oskar_U

Reputation: 482

Try the following list comprehension:

df["customer value segment"] = ["High Value Segment" if x>25000 else "Medium Value Segement" if x>10000 else "Low Value Segment" for x in df["Customer_Value"]]

Upvotes: 0

Snake_py
Snake_py

Reputation: 582

Hmm

np says it wants an array-like object, did you try to operate with arrays instead of df? Also, the second argument in the where function should be an array, not a string. I am just guessing here that the string brings the trouble. Try to put it in brackets.

But I actually would just iterate over the data frame and check with if or switch cases.

newCol = []

for ind in df.index:
   if df['Customer_Value'][ind] > 25000:
     newCol.append('High Value Segment')
   else if 10000 > df['Customer_Value'][ind] > 25000:
      newCol.append('Medium Value Segment')
   else:
     newCol.append('Low Value Segment')

and then just append the array. I wrote it here so the blanks might not work so good and you have to fix these in your editor. Let me know if it worked.

Upvotes: 0

oppressionslayer
oppressionslayer

Reputation: 7214

This should work:

df.loc[df['Customer_Value'] > 25000, 'customer value segment'] = 'High Value Segment' 
df.loc[(df['Customer_Value'] >= 10000) & (df['Customer_Value'] <= 25000) , 'customer value segment'] = 'Medium Value Segemnt ' 
df.loc[df['Customer_Value'] < 10000, 'customer value segment'] = 'Low Value Segment '

Upvotes: 1

Related Questions