Call the first second return value of a function called twice in a for loop

Question

I have this function that is part of a file that I created for the K-Means clustering algorithm.

def assign_to_cluster_mean_centroid(x_in=x, centroids_in=centroids, n_user=n):
'''This function calculates the euclidean distance between each data point and
a cluster centroid. It then allocates each data point to the centroid that it is the
closest to in distance.'''
    distances_arr_re = np.reshape(distance_between(
        centroids_in, x_in[0]), (len(centroids_in), len(x_in[0])))
    datapoint_cen = []
    distances_min = []  # Done if needed
    for value in zip(*distances_arr_re):
        distances_min.append(min(value))
        datapoint_cen.append(np.argmin(value)+1)

    clusters = {}
    for no_user in range(0, n_user):
        clusters[no_user+1] = []

    for d_point, cent in zip(x_in[0], datapoint_cen):
        clusters[cent].append(d_point)

    # Run a for loop and rewrite the centroids
    # with the newly calculated means
    for i, cluster in enumerate(clusters):
        reshaped = np.reshape(clusters[cluster], (len(clusters[cluster]), 2))
        centroids[i][0] = sum(reshaped[0:, 0])/len(reshaped[0:, 0])
        centroids[i][1] = sum(reshaped[0:, 1])/len(reshaped[0:, 1])
    print('Centroids for this iteration are:' + str(centroids))
return datapoint_cen, clusters

This function returns two values, a list (datapoint_cen) that contains all the labels derived from the distances calculated of each data point to it's nearest centroid as well as a dictionary (clusters) that contains each cluster with allocated data points to each cluster.

I then have a main loop and I call this function twice as per the below:

# Create the dataframe for vizualisation
cluster_data = pd.DataFrame({'Birth Rate': x[0][0:, 0],
                             'Life Expectancy': x[0][0:, 1],
                             'label': assign_to_cluster_mean_centroid()[0],
                             'Country': x[1]})

and also

mean = assign_to_cluster_mean_centroid()[1]

My problem is that upon calling the function the second time when assigning it to the variable "mean" the function recalculates everything and returns a new set of values for the clusters. I need to extract, upon the second call of the function, the clusters of the first call of the function in order for my algorithm to be accurate. Any assistance will be much appreciated.

Ridhaa Cupido · Accepted Answer

Is it possible to maybe declare a variable that calls the function?

e.g.

assigning = assign_to_cluster_mean_centroid()

Then use a slice later? e.g.

cluster_data = pd.DataFrame({'Birth Rate': x[0][0:, 0],
                             'Life Expectancy': x[0][0:, 1],
                             'label': assigning[0],
                             'Country': x[1]})

And later:

mean = assigning[1]

I don't know what loops we are currently inside at this point, so I'm not 100% sure you won't have scope problems.

Alternatively, you can unpack using multiple assignment.

e.g.

label, mean = assign_to_cluster_mean_centroid()

That means the second part is done already, you just need to:

cluster_data = pd.DataFrame({'Birth Rate': x[0][0:, 0],
                             'Life Expectancy': x[0][0:, 1],
                             'label': label,
                             'Country': x[1]})

Hope this helps?

Call the first second return value of a function called twice in a for loop

Answers (1)

Related Questions