Converting a hierarchical data structure to a flat one in Haskell

Question

I'm extracting some data from a text document organized like this:

- "day 1"
    - "Person 1"
        - "Bill 1"
    - "Person 2"
        - "Bill 2"

I can read this into a list of tuples that looks like this:

[(0,["day 1"]),(1,["Person 1"]),(2,["Bill 1"]),(1,["Person 2"]),(2,["Bill 2"])]

Where the first item of each tuple indicates the heading level, and the second item the information associated with each heading.

My question is, how can I get a list of items that looks like this:

[["day 1","Person 1","Bill 1"],["day 1","Person 2","Bill 2"]]

I.e. one list per deepest nested item, containing all the information from the headings above it. The closest I've gotten is this:

f [] = []
f (x:xs) = row:f rest where
leaves = takeWhile (\i -> fst i > fst x) xs
rest = dropWhile (\i -> fst i > fst x) xs
row = concat $ map (\i -> (snd x):[snd i]) leaves

Which gives me this:

[[["day 1"],["Intro 1"],["day 1"],["Bill 1"],["day 1"],["Intro 2"],["day 1"],["Bill 2"]]]

I'd like the solution to work for any number of levels.

P.s. I'm new to Haskell. I have a sense that I could/should use a tree to store the data, but I can't wrap my head around it. I also could not think of a better title.

Karolis Juodelė · Accepted Answer

I seem to have solved it.

group :: [(Integer, [String])] -> [[String]]
group ((n, str):ls) = let
      (children, rest) = span (\(m, _) -> m > n) ls
      subgroups = map (str ++) $ group children
   in if null children then [str] ++ group rest
      else subgroups ++ group rest
group [] = []

I didn't test it much though.

The idea is to notice the recursive pattern. This function takes the first element (N, S) of the list and then gathers all entries in higher levels until another element at level N, into a list 'children'. If there are no children, we are at the top level and S forms the output. If there are some, S is appended to all of them.

As for why your algorithm doesn't work, the problem is mostly in row. Notice that you are not descending recursively.

Trees can be used too.

data Tree a = Node a [Tree a] deriving Show

listToTree :: [(Integer, [String])] -> [Tree [String]]
listToTree ((n, str):ls) = let
      (children, rest) = span (\(m, _) -> m > n) ls
      subtrees = listToTree children
   in Node str subtrees : listToTree rest
listToTree [] = []

treeToList :: [Tree [String]] -> [[String]]
treeToList (Node s ns:ts) = children ++ treeToList ts where
   children = if null ns then [s] else map (s++) (treeToList ns)
treeToList [] = []

The algorithm is essentially the same. The first half goes to the first function, the second half to the second.

Converting a hierarchical data structure to a flat one in Haskell

Answers (2)

Trees

Building the Tree

Flattening the Tree

Conclusion

Appendix

Related Questions