Reputation: 457
I have the following Haskell polymorphic data type:
data Tree a = Leaf Int a | Node Int (Tree a) (Tree a)
The tree will be compressed in a bitstring of 0s and 1s. A '0' signifies a Node and it is followed by the encoding of the left subtree, then the encoding of the right subtree. A '1' signifies a Leaf and is followed by 7 bits of information (for example it might be a char). Each node/leaf is supposed to also contain the frequency of the information stored, but this is not important for this problem (so we can put anything there).
For example, starting from this encoded tree
[0,0,0,1,1,1,0,1,0,1,1,1,1,1,1,0,1,0,0,0,0,1,1,1,1,0,0,0,1,1,1,
1,0,0,1,1,1,1,1,1,1,0,0,1,0,0,1,1,1,1,0,1,1,1,1,1,1,0,0,0,0,1]
it is supposed to give back something like this
Node 0 (Node 0 (Node 0 (Leaf 0 'k') (Leaf 0 't'))
(Node 0 (Node 0 (Leaf 0 'q') (Leaf 0 'g')) (Leaf 0 'r')))
(Node 0 (Leaf 0 'w') (Leaf 0 'a'))
(spacing is not important, but it did not fit on one line).
I have little experience working with trees, especially when implementing code. I have a vague idea about how I'd solve this on paper (using something similar to a stack to deal with the depth/levels) but I am still a bit lost.
Any help or ideas are appreciated!
Upvotes: 4
Views: 528
Reputation: 44634
Well, you're trying to parse a tree of bytes from a bit-stream. Parsing's one of those cases where it pays to set up some structure: we're going to write a miniature parser combinator library in the style of How to Replace Failure by a List of Successes, which will allow us to write our code in an idiomatic functional style and delegate a lot of the work to the machine.
Translating the old rhyme into the language of monad transformers, and reading "string" as "bit-string", we have
newtype Parser a = Parser (StateT [Bool] [] a)
deriving (Functor, Applicative, Monad, Alternative)
runParser :: Parser a -> [Bool] -> [(a, [Bool])]
runParser (Parser m) = runStateT m
A parser is a monadic computation which operates statefully on a stream of Booleans, yielding a collection of successfully-parsed a
s. GHC's GeneralizedNewtypeDeriving
superpowers allow me to elide the boilerplate instances of Monad
et al.
The goal, then, is to write a Parser (Tree SevenBits)
- a parser which returns a tree of septuples of Booleans. (You can turn the 7 bits into a Word8
at your leisure by deriving a Functor
instance for Tree
and using fmap
.) I'm going to use the following definition of Tree
because it's simpler - I'm sure you can figure out how to adapt this code to your own ends.
data Tree a = Leaf a | Node (Tree a) (Tree a) deriving Show
type SevenBits = (Bool, Bool, Bool, Bool, Bool, Bool, Bool)
Here's a parser that attempts to consume a single bit from the input stream, failing if it's empty:
one :: Parser Bool
one = Parser $ do
stream <- get
case stream of
[] -> empty
(x:xs) -> put xs *> return x
Here's one which attempts to consume a particular bit from the input stream, failing if it doesn't match:
bit :: Bool -> Parser ()
bit b = do
i <- one
guard (i == b)
Here I'm pulling a sequence of seven Booleans from the input stream using replicateM
and packing them into a tuple. We'll be using this to populate Leaf
nodes' contents.
sevenBits :: Parser SevenBits
sevenBits = pack7 <$> replicateM 7 one
where pack7 [a,b,c,d,e,f,g] = (a, b, c, d, e, f, g)
Now we can finally write the code which parses the tree structure itself. We'll be choosing between the Node
and Leaf
alternatives using <|>
.
tree :: Parser (Tree SevenBits)
tree = node <|> leaf
where node = bit False *> liftA2 Node tree tree
leaf = bit True *> fmap Leaf sevenBits
If node
succeeds in parsing a low bit from the head of the stream, it continues to recursively parse the encoding of the left subtree followed by the right subtree, sequencing the applicative actions with liftA2
. The trick is that node
fails if it doesn't encounter a low bit at the head of the input stream, which tells <|>
to give up on node
and try leaf
instead.
Note how the structure of tree
reflects the structure of the Tree
type itself. This is applicative parsing at work. We could alternately have structured this parser monadically, first using one
to parse an arbitrary bit and then using a case
analysis on the bit to determine whether we should continue to parse a pair of trees or a leaf. In my opinion this version is simpler, more declarative, and less verbose.
Also compare the clarity of this code to the low-level style of @behzad.nouri's foldr
-based solution. Rather than building an explicit finite-state machine which switches between parsing nodes and leaves - an imperative-flavoured idea - my design allows you to declaratively describe the grammar to the machine using standard functions like liftA2
and <|>
and trust that the abstractions will do the right thing.
Anyway, here I'm parsing a simple tree consisting of a pair of Leaf
s containing the (binary-encoded) numbers 0
and 1
. As you can see, it returns the single successful parse and an empty stream of remaining bits.
ghci> runParser tree $ map (>0) [0, 1, 0,0,0,0,0,0,0, 1, 0,0,0,0,0,0,1]
[(Node (Leaf (False, False, False, False, False, False, False)) (Leaf (False, False, False, False, False, False, True)),[])]
Upvotes: 2
Reputation: 749
Ok, here's a simple (ad-hoc, but easier to understand) way.
We need to buid a function parse
, with the following type:
parse :: [Int] -> Tree Char
The approach you mentioned, with stacks, is the imperative one. Here we just lay on the recursive calls. The stack will be built by the compiler and it will just have each recursive call stored in it (At least you can imagine it that way, if you want, or just ignore all this paragraph).
So, the idea is the following: whenever you find a 0
, you need to make two recursive calls to the algorithm. The first recursive call will read one branch (the left one) of the tree. The second one needs to be called with the rest of the list as argument. The rest left by the first recursive call. So, we need a auxiliar function parse'
with the following type (now we return a pair, being the second value the rest of list):
parse' :: [Int] -> (Tree Char, [Int])
Next, you can see a piece of code where the 0
case is just as described before.
For the 1
case, we just need to take the next 7 numbers and make them into a char somehow (I leave the definition of toChar
for you), then, just return a Leaf
and the rest of the list.
parse' (0:xs) = let (l, xs') = parse' xs
(r, xs'') = parse' xs' in (Node 0 l r, xs'') --xs'' should be []
parse' (1:xs) = let w = toChar (take 7 xs) in (Leaf 0 w , drop 7 xs)
Finally, our parse function just calls the auxiliary parse one and returns the first element of the pair.
parse xs = fst $ parse' xs
Upvotes: 1
Reputation: 77961
do a right fold:
import Data.Char (chr)
data Tree a = Leaf a | Node (Tree a) (Tree a)
deriving Show
build :: [Int] -> [Tree Char]
build xs = foldr go (\_ _ -> []) xs 0 0
where
nil = Leaf '?'
go 0 run 0 0 = case run 0 0 of
[] -> Node nil nil:[]
x:[] -> Node x nil:[]
x:y:zs -> Node x y :zs
go 1 run 0 0 = run 0 1
go _ _ _ 0 = error "this should not happen!"
go x run v 7 = (Leaf $ chr (v * 2 + x)): run 0 0
go x run v k = run (v * 2 + x) (k + 1)
then:
\> head $ build [0,0,0,1,1,1,0, ...] -- the list of 01s as in the question
Node (Node (Node (Leaf 'k') (Leaf 't'))
(Node (Node (Leaf 'q') (Leaf 'g')) (Leaf 'r')))
(Node (Leaf 'w') (Leaf 'a'))
Upvotes: 1