Reputation: 65
I have an array with 100 lines and 5 columns. I would like to group them in separate arrays, based on a number given in the 5th column. The 5th column contains integer numbers from 0 to N (0, 1, 2, ...N).
So for N=2, values in column 5th will be 0, 1, 2
thus I would like to create 3 arrays with the lines having 0, 1, 2 respectively.
here is the code in python for N=3:
df_array_with_clusters=...
for i in range(len(df_array_with_clusters)):
if df_array_with_clusters[i, -1]== 0:
cluster_0[i,:] = df_array_with_clusters[i, :-1]
elif df_array_with_clusters[i, -1]== 1:
cluster_1[i,:] = df_array_with_clusters[i, :-1]
else:
cluster_2[i,:] = df_array_with_clusters[i, :-1]
thanks
Upvotes: 1
Views: 271
Reputation: 12263
Something like this should work for you:
def distribute_into_clusters(data, N):
clusters=[[] for _ in range(N)]
for row in data:
cluster_id = row[-1]
clusters[cluster_id].append(row[:-1])
return clusters
What this returns is a list of clusters, each of which is a list of rows as np.array.
If you want each cluster to be an array instead, change the return statement to this:
return [np.array(cluster) for cluster in clusters]
Here's a second solution that distributes the data into clusters in NumPy. It might be more efficient.
def distribute_into_clusters(data, N):
return [
data[[row[-1] == cluster_id for row in data]][:,:-1]
for cluster_id in range(N)
]
[row[-1] == cluster_id for row in data]
gives me a list of bools indicating which rows belong in cluster_id
.data[...]
slices data keeping only the rows where the bool is True[:,:-1]
removes the cluster ID columnUpvotes: 1