Reputation: 751
I'm extracting some data from a text document organized like this:
- "day 1"
- "Person 1"
- "Bill 1"
- "Person 2"
- "Bill 2"
I can read this into a list of tuples that looks like this:
[(0,["day 1"]),(1,["Person 1"]),(2,["Bill 1"]),(1,["Person 2"]),(2,["Bill 2"])]
Where the first item of each tuple indicates the heading level, and the second item the information associated with each heading.
My question is, how can I get a list of items that looks like this:
[["day 1","Person 1","Bill 1"],["day 1","Person 2","Bill 2"]]
I.e. one list per deepest nested item, containing all the information from the headings above it. The closest I've gotten is this:
f [] = []
f (x:xs) = row:f rest where
leaves = takeWhile (\i -> fst i > fst x) xs
rest = dropWhile (\i -> fst i > fst x) xs
row = concat $ map (\i -> (snd x):[snd i]) leaves
Which gives me this:
[[["day 1"],["Intro 1"],["day 1"],["Bill 1"],["day 1"],["Intro 2"],["day 1"],["Bill 2"]]]
I'd like the solution to work for any number of levels.
P.s. I'm new to Haskell. I have a sense that I could/should use a tree to store the data, but I can't wrap my head around it. I also could not think of a better title.
Upvotes: 4
Views: 1350
Reputation: 35089
You were right that you should probably use a tree to store the data. I'll copy how Data.Tree
does it:
data Tree a = Node a (Forest a) deriving (Show)
type Forest a = [Tree a]
Now we want to take your weakly typed list of tuples and convert it to a (slightly) stronger Tree
of String
s. Any time you need to convert a weakly typed value and validate it before converting to a stronger type, you use a Parser
:
type YourData = [(Int, [String])]
type Parser a = YourData -> Maybe (a, YourData)
The YourData
type synonym represents the weak type that you are parsing. The a
type variable is the value you are retrieving from the parse. Our Parser
type returns a Maybe
because the Parser
might fail. To see why, the following input does not correspond to a valid Tree
, since it is missing level 1 of the tree:
[(0, ["val1"]), (2, ["val2"])]
If the Parser
does succeed, it also returns the unconsumed input so that subsequent parsing stages can use it.
Now, curiously enough, the above Parser
type exactly matches a well known monad transformer stack:
StateT s Maybe a
You can see this if you expand out the underlying implementation of StateT
:
StateT s Maybe a ~ s -> Maybe (a, s)
This means we can just define:
import Control.Monad.Trans.State.Strict
type Parser a = StateT [(Int, [String])] Maybe a
If we do this, we get a Monad
, Applicative
and Alternative
instance for our Parser
type for free. This makes it very easy to define parsers!
First, we must define a primitive parser that consumes a single node of the tree:
parseElement :: Int -> Parser String
parseElement level = StateT $ \list -> case list of
[] -> Nothing
(level', strs):rest -> case strs of
[str] ->
if (level' == level)
then Just (str, rest)
else Nothing
_ -> Nothing
This is the only non-trivial piece of code we have to write, which, because it is total, handles all the following corner cases:
The next part is where things get really elegant. We can then define two mutually recursive parsers, one for parsing a Tree
, and the other for parsing a Forest
:
import Control.Applicative
parseTree :: Int -> Parser (Tree String)
parseTree level = Node <$> parseElement level <*> parseForest (level + 1)
parseForest :: Int -> Parser (Forest String)
parseForest level = many (parseTree level)
The first parser uses Applicative
style, since StateT
gave us an Applicative
instance for free. However, I could also have used StateT
's Monad
instance instead, to give code that's more readable for an imperative programmer:
parseTree :: Int -> Parser (Tree String)
parseTree level = do
str <- parseElement level
forest <- parseForest (level + 1)
return $ Node str forest
But what about the many
function? What's that doing? Let's look at its type:
many :: (Alternative f) => f a -> f [a]
It takes anything that returns a value and implements Applicative
and instead calls it repeatedly to return a list of values instead. When we defined our Parser
type in terms of State
, we got an Alternative
instance for free, so we can use the many
function to convert something that parses a single Tree
(i.e. parseTree
), into something that parses a Forest
(i.e. parseForest
).
To use our Parser
, we just rename an existing StateT
function to make its purpose clear:
runParser :: Parser a -> [(Int, [String])] -> Maybe a runParser = evalStateT
Then we just run it!
>>> runParser (parseForest 0) [(0,["day 1"]),(1,["Person 1"]),(2,["Bill 1"]),(1,["Person 2"]),(2,["Bill 2"])]
Just [Node "day 1" [Node "Person 1" [Node "Bill 1" []],Node "Person 2" [Node "Bill 2" []]]]
That's just magic! Let's see what happens if we give it an invalid input:
>>> runParser (parseForest 0) [(0, ["val1"]), (2, ["val2"])]
Just [Node "val1" []]
It succeeds on a portion of the input! We can actually specify that it must consume the entire input by defining a parser that matches the end of the input:
eof :: Parser ()
eof = StateT $ \list -> case list of
[] -> Just ((), [])
_ -> Nothing
Now let's try it:
>>> runParser (parseForest 0 >> eof) [(0, ["val1"]), (2, ["val2"])]
Nothing
Perfect!
To answer your second question, we again solve the problem using mutually recursive functions:
flattenForest :: Forest a -> [[a]]
flattenForest forest = concatMap flattenTree forest
flattenTree :: Tree a -> [[a]]
flattenTree (Node a forest) = case forest of
[] -> [[a]]
_ -> map (a:) (flattenForest forest)
Let's try it!
>>> flattenForest [Node "day 1" [Node "Person 1" [Node "Bill 1" []],Node "Person 2" [Node "Bill 2" []]]]
[["day 1","Person 1","Bill 1"],["day 1","Person 2","Bill 2"]]
Now, technically I didn't have to use mutually recursive functions. I could have done a single recursive function. I was just following the definition of the Tree
type from Data.Tree
.
So in theory I could have shortened the code even further by skipping the intermediate Tree
type and just parsing the flattened result directly, but I figured you might want to use the Tree
-based representation for other purposes.
The key take home points from this are:
If you do these, you will write robust and elegant code that exactly matches the problem.
Here is the final code that incorporates everything I've said:
import Control.Applicative
import Control.Monad.Trans.State.Strict
import Data.Tree
type YourType = [(Int, [String])]
type Parser a = StateT [(Int, [String])] Maybe a
runParser :: Parser a -> [(Int, [String])] -> Maybe a
runParser = evalStateT
parseElement :: Int -> Parser String
parseElement level = StateT $ \list -> case list of
[] -> Nothing
(level', strs):rest -> case strs of
[str] ->
if (level' == level)
then Just (str, rest)
else Nothing
_ -> Nothing
parseTree :: Int -> Parser (Tree String)
parseTree level = Node <$> parseElement level <*> parseForest (level + 1)
parseForest :: Int -> Parser (Forest String)
parseForest level = many (parseTree level)
eof :: Parser ()
eof = StateT $ \list -> case list of
[] -> Just ((), [])
_ -> Nothing
flattenForest :: Forest a -> [[a]]
flattenForest forest = concatMap flattenTree forest
flattenTree :: Tree a -> [[a]]
flattenTree (Node a forest) = case forest of
[] -> [[a]]
_ -> map (a:) (flattenForest forest)
Upvotes: 5
Reputation: 3770
I seem to have solved it.
group :: [(Integer, [String])] -> [[String]]
group ((n, str):ls) = let
(children, rest) = span (\(m, _) -> m > n) ls
subgroups = map (str ++) $ group children
in if null children then [str] ++ group rest
else subgroups ++ group rest
group [] = []
I didn't test it much though.
The idea is to notice the recursive pattern. This function takes the first element (N, S) of the list and then gathers all entries in higher levels until another element at level N, into a list 'children'. If there are no children, we are at the top level and S forms the output. If there are some, S is appended to all of them.
As for why your algorithm doesn't work, the problem is mostly in row
. Notice that you are not descending recursively.
Trees can be used too.
data Tree a = Node a [Tree a] deriving Show
listToTree :: [(Integer, [String])] -> [Tree [String]]
listToTree ((n, str):ls) = let
(children, rest) = span (\(m, _) -> m > n) ls
subtrees = listToTree children
in Node str subtrees : listToTree rest
listToTree [] = []
treeToList :: [Tree [String]] -> [[String]]
treeToList (Node s ns:ts) = children ++ treeToList ts where
children = if null ns then [s] else map (s++) (treeToList ns)
treeToList [] = []
The algorithm is essentially the same. The first half goes to the first function, the second half to the second.
Upvotes: 3