Reputation: 16059
Would you use if/else to write this algorithm in Haskell? Is there a way to express it without them? It's hard to extract functions out of the middle that have meaning. This is just the output of a machine learning system.
I'm implementing the algorithm for classifying segments of html content as Content or Boilerplate described here. This has the weights already hard coded.
curr_linkDensity <= 0.333333
| prev_linkDensity <= 0.555556
| | curr_numWords <= 16
| | | next_numWords <= 15
| | | | prev_numWords <= 4: BOILERPLATE
| | | | prev_numWords > 4: CONTENT
| | | next_numWords > 15: CONTENT
| | curr_numWords > 16: CONTENT
| prev_linkDensity > 0.555556
| | curr_numWords <= 40
| | | next_numWords <= 17: BOILERPLATE
| | | next_numWords > 17: CONTENT
| | curr_numWords > 40: CONTENT
curr_linkDensity > 0.333333: BOILERPLATE
Upvotes: 5
Views: 178
Reputation: 60463
Not simplifying the logic manually (assuming you might generate this code automatically), I think using MultiWayIf
is pretty clean and direct.
{-# LANGUAGE MultiWayIf #-}
data Stats = Stats {
curr_linkDensity :: Double,
prev_linkDensity :: Double,
...
}
data Classification = Content | Boilerplate
classify :: Stats -> Classification
classify s = if
| curr_linkDensity s <= 0.333333 -> if
| prev_linkDensity s <= 0.555556 -> if
| curr_numWords s <= 16 -> if
| next_numWords s <= 15 -> if
| prev_numWords s <= 4 -> Boilerplate
| prev_numWords s > 4 -> Content
| next_numWords s > 16 -> Content
...
and so on.
However, since this is so structured -- just a tree of if/else with comparisons, also consider creating a decision tree data structure and writing an interpreter for it. This will allow you to do transformations, manipulations, inspections. Maybe it will buy you something; defining miniature languages for your specifications can be surprisingly beneficial.
data DecisionTree i o
= Comparison (i -> Double) Double (DecisionTree i o) (DecisionTree i o)
| Leaf o
runDecisionTree :: DecisionTree i o -> i -> o
runDecisionTree (Comparison f v ifLess ifGreater) i
| f i <= v = runDecisionTree ifLess i
| otherwise = runDecisionTree ifGreater i
runDecisionTree (Leaf o) = o
-- DecisionTree is an encoding of a function, and you can write
-- Functor, Applicative, and Monad instances!
Then
classifier :: DecisionTree Stats Classification
classifier =
Comparison curr_linkDensity 0.333333
(Comparison prev_linkDensity 0.555556
(Comparison curr_numWords 16
(Comparison next_numWords 15
(Comparison prev_numWords 4
(Leaf Boilerplate)
(Leaf Content))
(Leaf Content)
...
Upvotes: 11
Reputation: 123410
Since there are just three paths in this decision tree that leads to a BOILERPLATE state, I'd just iterate and simplify them:
isBoilerplate =
prev_linkDensity <= 0.555556 && curr_numWords <= 16 && prev_numWords <= 4
|| prev_linkDensity > 0.555556 && curr_numWords <= 40 && next_numWords <= 17
|| curr_linkDensity > 0.333333
Upvotes: 5