Reputation: 161
I would like to parse string to list in Haskell, but I have no idea how to write it proper. String structure is:
A [B [], C [D [], E []]]
and it represents structure
A-+-B
|
`-C-+-D
|
`-E
And ideas? Thanks!
Upvotes: 0
Views: 1667
Reputation: 3497
Your best bet when parsing in Haskell is almost always Parsec. Here's an example.
import Text.ParserCombinators.Parsec
data Tree a = Tree [Tree a] | Leaf a deriving Show
parseLeaf :: Parser (Tree Char)
parseLeaf = noneOf "[]" >>= return.Leaf
parseNode :: Parser (Tree Char)
parseNode = do
char '['
a <- parseTree
char ']'
return (Tree a)
parseTree = many1 (parseLeaf <|> parseNode)
Testing this out:
> parseTest parseTree "a[aa]"
[Leaf 'a',Tree [Leaf 'a',Leaf 'a']]
> parseTest parseTree "a[aa]aaa"
[Leaf 'a',Tree [Leaf 'a',Leaf 'a'],Leaf 'a',Leaf 'a',Leaf 'a']
> parseTest parseTree "a[aa[aaaaaa]]aaa"
[Leaf 'a',Tree [Leaf 'a',Leaf 'a',Tree [Leaf 'a',Leaf 'a',Leaf 'a',Leaf 'a',Leaf 'a',Leaf 'a']],Leaf 'a',Leaf 'a',Leaf 'a']
Here's how it works. Parsers in Parsec are monadic, and so support the usual do-notation. (You can write parsers applicatively as well, you can look up how to do that elsewhere.)
We start with a simple data structure
data Tree a = Tree [Tree a] | Leaf a deriving Show
This is not quite the same as what you wanted, but as I wasn't entirely sure what your semantics were, I've used this example. You should be able to adapt it for your purposes.
We then need a parser for each possible part of the tree. A leaf is fairly simple, it is just anything that isn't a bracket:
parseLeaf :: Parser (Tree Char)
parseLeaf = noneOf "[]" >>= return.Leaf
Note that this could have been written out in longhand like
parseLeaf = do
a <- noneOf "[]"
return (Leaf a)
Then to parse a branching part of the tree, we need to parse the opening and closing brackets. In between the brackets we can have a whole tree again.
parseNode :: Parser (Tree Char)
parseNode = do
char '['
a <- parseTree
char ']'
return (Tree a)
So what is parseTree
? It is just many of either of the parsers we have written. The <|>
operator allows the parser to choose either parser, whichever parses correctly first. So we have
parseTree = many1 (parseLeaf <|> parseNode)
You should be able to adapt this to your purposes. It looks like your structure might be somewhat more like this:
data Structure a = Node (Structure a) a a | Leaf a
By following the same principles, working out what the parser is needed for each possibility and then combining them, you should be parsing in no time.
UPDATE
Here is a very quick and dirty version of parsing the data structure you asked about. It does not support spaces or commas, but should help demonstrate the basic principle.
data Tree = Tree Char [Tree] deriving Show
parseTree :: Parser Tree
parseTree = do
character <- noneOf "[]"
subtree <- parseSubTree
return $ Tree character subtree
parseSubTree :: Parser [Tree]
parseSubTree = do
char '['
trees <- many parseTree
char ']'
return trees
And here is a version with the commas and whitespace added in a fairly simple way. There are a lot of useful combinators in the parsec library that could simplify and improve this, you should investigate them yourself. Note also the applicative style used for the symbol
shortcut parser definition. Many people prefer applicative style for parsers and it can be a lot more succinct, so that is worth finding out about as well.
data Tree = Tree Char [Tree] deriving Show
symbol :: String -> Parser String
symbol s = string s <* spaces
parseTree :: Parser Tree
parseTree = do
character <- noneOf "[]"
spaces
subtree <- parseSubTree
return $ Tree character subtree
parseSubTree :: Parser [Tree]
parseSubTree = do
symbol "["
trees <- sepBy parseTree (symbol ",")
symbol "]"
return trees
Here it is working:
> parseTest parseTree "A [ A [ B [ ] , C [ ], D [ ] ] ] "
Tree 'A' [Tree 'A' [Tree 'B' [],Tree 'C' [],Tree 'D' []]]
Upvotes: 2