CLustering Similar values in dataframe based on averages

Question

I Have a Dataframe that has records for the Zone wise sales, need to cluster them based on avg sales

Zone         Consumption
North          1
South          3
East           10
North          8
North2         0
South          5

I used the below code

def Clustering(row):
    if row['Consumption']<.5*np.mean(['Consumption']):
        val='E'
    elif row['Consumption']<.75*np.mean(['Consumption']):
        val='D'
    elif row['Consumption']<1*np.mean(['Consumption']):
        val='C'
    elif row['Consumption']<1.5*np.mean(['Consumption']):
        val='B'
    elif row['Consumption']<2.5*np.mean(['Consumption']):
        val='A'
    else:
        val='Z'
    return val

Traceback

 in Clustering(row)
      1 def Clustering(row):
----> 2     if row['Consumption']<.5*np.mean(['Consumption']):
      3         val='E'
      4     elif row['Consumption']<.75*np.mean(['Consumption']):
      5         val='D'

<__array_function__ internals> in mean(*args, **kwargs)

~\anaconda3\lib\site-packages
umpy\core\fromnumeric.py in mean(a, axis, dtype, out, keepdims)
   3333 
   3334     return _methods._mean(a, axis=axis, dtype=dtype,
-> 3335                           out=out, **kwargs)
   3336 
   3337 

~\anaconda3\lib\site-packages
umpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims)
    149             is_float16_result = True
    150 
--> 151     ret = umr_sum(arr, axis, dtype, out, keepdims)
    152     if isinstance(ret, mu.ndarray):
    153         ret = um.true_divide(

TypeError: cannot perform reduce with flexible type

My assumption was that the error is caused due to maybe the Sales column having some str values but that isnt the case, how shoud i go abt fixing this.

Code Different · Accepted Answer

Have you tried pd.cut? Assuming df['Consumption'].mean() >= 0:

# Define the bins, which are double-ended by -INF and INF
bins = np.array([.5, .75, 1, 1.5, 2.5]) * df['Consumption'].mean()
bins = np.hstack((np.NINF, bins, np.inf))

df['Cluster'] = pd.cut(df['Consumption'], bins, labels=list('EDCBAZ')).astype('str')

CLustering Similar values in dataframe based on averages

Answers (1)

Related Questions