VagrantC
VagrantC

Reputation: 827

Machine Learning clustering with n-dimensional data in Python

I'm trying to figure out a procedure to perform clustering on a set of data with 52 dimensions. This is purely for my own learning so I have a data set of known fields. The data is from retrosheet.org Gamelogs using the World Series data set. I'm attempting to use only columns 25-77, so only the integers, ignoring the string data.

This is my first attempt at unsupervised learning and while I understand the concepts, I'm struggling to implement a solution in Python. I've been using scipy and numpy. If anyone knows a good place to start or some suggestions on tackling this problem, I'd appreciate it.

Upvotes: 0

Views: 5540

Answers (1)

user4322779
user4322779

Reputation:

Scikit learn is the way to go for clustering in Python. See http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_digits.html#example-cluster-plot-kmeans-digits-py for a demo and code for clustering with 64 features. It would be good to start with the tutorial at http://scikit-learn.org/stable/tutorial/basic/tutorial.html and apply what you learn there to your dataset and then to k-means clustering.

Upvotes: 1

Related Questions