Reputation: 73
So I implemented the Power Iteration Clustering in Spark(inbuilt) with the Dataset I have. I got the model after using this
model = PowerIterationClustering.train(similarities, 2, 10)
When I do
model.assignments.collect()
I've all the values.
Now I want to plot a scatter plot of this model using Matplotlib. But I'm not able to understand how to do it. I got that x and y in the below code is id and cluster in model-
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
But I'm not able to understand how to use it. What should be the area, colors ?
Upvotes: 1
Views: 379
Reputation: 225
You first need to parse the Assignment tuple, then collect. The output will be:
(<id int>, <cluster int>)
Instead of
Assignment(id=..,cluster=...)
You can do this by
model.assignments.map(lambda asm: (asm[0], asm[1])).collect()
You can then extract the x and y from the resulting list of tuples.
Upvotes: 1