Y.B.
Y.B.

Reputation: 3596

Machine Learning Multiclass Classification for thousands of Classes

I have a few million Entities with 1 to 10 attributes describing each of them and about one hundred thousand Classes to sort them into.

Are there any Machine Learning algorithms (ideally available on SQL Server, Azure or as .NET library) or a stand-alone tools for massive Multiclass Classification capable of suggesting the top few best matching Classes for each of the Entities?

I have found this research along the lines: Learning compact class codes for fast inference in large multi class classification, but could not find any implementations.

At the moment I have sort of a K-nearest neighbours based on Full-Text Search with a couple of other dimensions weighted at 1/3 each to improve the results. I am looking for the ways to improve both performance and accuracy.

Upvotes: 4

Views: 2490

Answers (1)

Have you tried ensemble learning? It's all about building multiple "weak" multiclass classifiers and finding a consensus through majority voting. The main advantage is because you can randomly select samples of you dataset and each classifier can learn from different sets. You can also try deep learning with Convolutional Neural Networks implemented with TensorFlow or Theano (I would recommend the last one). If you have a GPU you can make use of its processing capability to improve the training step. This code here https://github.com/attardi/CNN_sentence uses GPU processing, theano library and multiclass classification (for NLP applications), but it's not in C# as you asked.

Upvotes: 2

Related Questions