happinessiskey
happinessiskey

Reputation: 97

ValueError: x and y must be the same size In Python while creating KMeans Model

I'm building a Kmeans clustering model with a churn dataset and am getting an error that says ValueError: x and y must be the same size when trying to create cluster graph.

I'll post both my function and the graph code here in a sec, but in trying to narrow it down, I think it may have something to do with this line of code in the function:

x=kmeans.cluster_centers_[:,0]
                , y=kmeans.cluster_centers_[:,1]

Here's the full code

def Create_kmeans_cluster_graph(df_final, data, n_clusters, x_title, y_title, chart_title):
    """ Display K-means cluster based on data """
    
    kmeans = KMeans(n_clusters=n_clusters # No of cluster in data
                    , random_state = random_state # Selecting same training data
                   ) 

    kmeans.fit(data)
    kmean_colors = [plotColor[c] for c in kmeans.labels_]


    fig = plt.figure(figsize=(12,8))
    plt.scatter(x= x_title + '_norm'
                , y= y_title + '_norm'
                , data=data 
                , color=kmean_colors # color of data points
                , alpha=0.25 # transparancy of data points
               )

    plt.xlabel(x_title)
    plt.ylabel(y_title)

    plt.scatter(x=kmeans.cluster_centers_[:,0]
                , y=kmeans.cluster_centers_[:,1]
                , color='black'
                , marker='X' # Marker sign for data points
                , s=100 # marker size
               )
    
    plt.title(chart_title,fontsize=15)
    plt.show()
    
    return kmeans.fit_predict(df_final[df_final.Churn==1][[x_title+'_norm', y_title +'_norm']])



//Graph

df_final['Cluster'] = -1 # by default set Cluster to -1
df_final.iloc[(df_final.Churn==1),'Cluster'] = Create_kmeans_cluster_graph(df_final
                            ,df_final[df_final.Churn==1][['Tenure_norm','MonthlyCharge_norm']]
                            ,3
                           ,'Tenure'
                           ,'MonthlyCharges'
                           ,"Tenure vs Monthlycharges : Churn customer cluster")

df_final['Cluster'].unique()


Upvotes: 0

Views: 117

Answers (1)

StupidWolf
StupidWolf

Reputation: 46898

You get that error because of this line:

plt.scatter(x= x_title + '_norm'
                , y= y_title + '_norm'
                , data=data 
                , color=kmean_colors # color of data points
                , alpha=0.25 # transparancy of data points
               )

If you use plt.scatter, it does not take in data= as an argument, you can read the help page. You can either do:

plt.scatter(data[x_title + '_norm'],data[y_title + '_norm'],...)

Or you use the plot.scatter method on a pandas dataframe, which I did in a edited version of your function:

def Create_kmeans_cluster_graph(df_final, data, n_clusters, x_title, y_title, chart_title):
    plotColor = ['k','g','b']
    kmeans = KMeans(n_clusters=n_clusters , random_state = random_state)

    kmeans.fit(data)
    kmean_colors = [plotColor[c] for c in kmeans.labels_]

    data.plot.scatter(x= x_title + '_norm', y= y_title + '_norm',
                      color=kmean_colors,alpha=0.25)

    plt.xlabel(x_title)
    plt.ylabel(y_title)

    plt.scatter(x=kmeans.cluster_centers_[:,0],y=kmeans.cluster_centers_[:,1],
                color='black',marker='X',s=100)
    
    return kmeans.labels_

On an example dataset, it works:

import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
random_state = 42

np.random.seed(42)

df_final = pd.DataFrame({'Tenure_norm':np.random.uniform(0,1,50),
                         'MonthlyCharge_norm':np.random.uniform(0,1,50),
                        'Churn':np.random.randint(0,3,50)})

Create_kmeans_cluster_graph(df_final
                            ,df_final[df_final.Churn==1][['Tenure_norm','MonthlyCharge_norm']]
                            ,3
                           ,'Tenure'
                           ,'MonthlyCharge'
                           ,"Tenure vs Monthlycharges : Churn customer cluster")

enter image description here

Upvotes: 1

Related Questions