David Barda
David Barda

Reputation: 1010

Haskell polymorphism and typeclass instance

I am trying to write a machine learning library in Haskell, to work on my Haskell skills. I thought about a general design involving a class which is like so:

  class Classifier classifier where
    train :: X -> y -> trainingData
    classify :: trainingData -> x -> y

For example, given a set of examples X, and their true labels y, train returns trainingData which is used in the classify function.

So, if I want to implement KNN, I would do it like so:

data KNN = KNN Int (Int -> Int -> Float) 

Where the first int is the number of neighbors and the function its the metric that calculate the distance between the vectors

  instance Classifier KNN where
---This is where I am stuck---

How can I implement the Classifier type class function so they would be generic to all of the classifier that I will create? I am feeling like I am treating Haskell too much like an imperative OOP like language and I'd like to do this the Haskell way.

Upvotes: 2

Views: 218

Answers (2)

n. m. could be an AI
n. m. could be an AI

Reputation: 120079

I would say you need multi-parameter type classes (with optional functional dependencies, or type families; I omit those).

 class Classifier c s l  k where
      train :: c -> [(s, l)] -> k
      classify :: c -> k -> s -> l
      combine :: c -> k -> k -> k

There is a four-sided relationship between classifier, sample, label and knowledge types.

The train method derives some knowledge (k) from a set of sample (s) — label (l) pairs. The classify method uses that knowledge to infer a label for a sample. (The combine method joins two pieces of knowledge together; don't know if it always applies).

Upvotes: 4

chepner
chepner

Reputation: 532238

Assuming your type class has no knowledge of what a classifier provides, you could do something like

class Classifier c where
  train :: [x] -> [y] -> c -> [(x,y)]
  classify :: [(x,y)] -> c -> x > y

Here, train is getting a list of samples of type x, a list of labels of type y, and a classifier of some type c, and needs to return a list of sample/label pairs.

classify takes a list of sample/label pairs (such as that produced by train), the classifier, and a sample, and produces a new label.

(At the very least, though, I'd probably replace [(x,y)] with something like Map x y.)

The key is that the classifier itself needs to be used by both train and classify, although you don't need to know what that would look like at this time.

Your instance for KNN could then look like

instance Classifier KNN where

  train samples labels (KNN n f) = ...
  classify td (KNN n f) sample = ...

Here, n and f can be used both to create the training data, and to help pick the closest member of the training data for a sample point.

Upvotes: 3

Related Questions