Nick
Nick

Reputation: 73

How to plot Power Iteration Clustering model using matplotlib

So I implemented the Power Iteration Clustering in Spark(inbuilt) with the Dataset I have. I got the model after using this

model = PowerIterationClustering.train(similarities, 2, 10)

When I do

model.assignments.collect()

I've all the values.

Now I want to plot a scatter plot of this model using Matplotlib. But I'm not able to understand how to do it. I got that x and y in the below code is id and cluster in model-

plt.scatter(x, y, s=area, c=colors, alpha=0.5)

But I'm not able to understand how to use it. What should be the area, colors ?

Upvotes: 1

Views: 379

Answers (1)

manjam
manjam

Reputation: 225

You first need to parse the Assignment tuple, then collect. The output will be:

(<id int>, <cluster int>)

Instead of

Assignment(id=..,cluster=...)

You can do this by

model.assignments.map(lambda asm: (asm[0], asm[1])).collect()

You can then extract the x and y from the resulting list of tuples.

Upvotes: 1

Related Questions