Reputation: 111
I'm looking to port our home-grown platform of various machine learning algorithms from C# to a more robust data mining platform such as R. While it's obvious R is great at many types of data mining tasks, it is not clear to me if it can be used for text classification.
Specifically, we extract a list of bigrams from the text and then classify it into one of 15 different categories, eg:
Bigram list: jewelry, books, watches, shoes, department store -> Category: Shopping
We'd want to both train the models in R as well as hook up to a database to perform this on a larger scale.
Can it be done in R?
Upvotes: 3
Views: 324
Reputation: 28274
Hmm, I am rather starting to look into Machine Learning, but I might have a suggestion: have you considered Weka? There's a bunch of various algorithms around and there'S IS some documentation. Plus, there is an R package RWeka
that makes use of the Weka jars.
EDIT: There is also a nice, comprehensive read by Witten et al. : Data mining that contains an extensive description of Weka among other interesting things. Look into the API opportunities.
Upvotes: 1