Reputation: 1010
I am trying to write a machine learning library in Haskell, to work on my Haskell skills. I thought about a general design involving a class which is like so:
class Classifier classifier where
train :: X -> y -> trainingData
classify :: trainingData -> x -> y
For example, given a set of examples X, and their true labels y, train returns trainingData which is used in the classify function.
So, if I want to implement KNN, I would do it like so:
data KNN = KNN Int (Int -> Int -> Float)
Where the first int is the number of neighbors and the function its the metric that calculate the distance between the vectors
instance Classifier KNN where
---This is where I am stuck---
How can I implement the Classifier type class function so they would be generic to all of the classifier that I will create? I am feeling like I am treating Haskell too much like an imperative OOP like language and I'd like to do this the Haskell way.
Upvotes: 2
Views: 218
Reputation: 120079
I would say you need multi-parameter type classes (with optional functional dependencies, or type families; I omit those).
class Classifier c s l k where
train :: c -> [(s, l)] -> k
classify :: c -> k -> s -> l
combine :: c -> k -> k -> k
There is a four-sided relationship between classifier, sample, label and knowledge types.
The train method derives some knowledge (k) from a set of sample (s) — label (l) pairs. The classify method uses that knowledge to infer a label for a sample. (The combine method joins two pieces of knowledge together; don't know if it always applies).
Upvotes: 4
Reputation: 532238
Assuming your type class has no knowledge of what a classifier provides, you could do something like
class Classifier c where
train :: [x] -> [y] -> c -> [(x,y)]
classify :: [(x,y)] -> c -> x > y
Here, train
is getting a list of samples of type x
, a list of labels of type y
, and a classifier of some type c
, and needs to return a list of sample/label pairs.
classify
takes a list of sample/label pairs (such as that produced by train
), the classifier, and a sample, and produces a new label.
(At the very least, though, I'd probably replace [(x,y)]
with something like Map x y
.)
The key is that the classifier itself needs to be used by both train
and classify
, although you don't need to know what that would look like at this time.
Your instance for KNN
could then look like
instance Classifier KNN where
train samples labels (KNN n f) = ...
classify td (KNN n f) sample = ...
Here, n
and f
can be used both to create the training data, and to help pick the closest member of the training data for a sample point.
Upvotes: 3