Reputation: 2741
I would like to parse several lists of commands indented or formated as array with Parsec
. As example, my lists will be formated like this:
Command1 arg1 arg2 Command1 arg1 arg2 Command1 arg1 arg2
Command2 arg1 Command3 arg1 arg2 arg3
Command3 arg1 arg2 arg3
Command4
Command3 arg1 arg2 arg3 Command2 arg1
Command4
Command4
Command5 arg1 Command2 arg1
These commands are supposed to be parsed column by column with state changes in the parser.
My idea is to gather the commands into separated list of string and parse these strings into a subparser (executed inside the main parser).
I inspected the API of the Parsec library but I didn't find a function to do that.
I considered using runParser
but this function only extract the results of the parser and not its state.
I also considered making a function inspired by runParsecT
and mkPT
to make my own parser, but the constructors ParsecT
or initialPos
are not available (not exported by the library)
Is it possible to run a subparser inside a parser with Parsec
?
If not, does a library such as megaparsec can solve my problem?
Upvotes: 2
Views: 312
Reputation: 1590
As a starting point, the simplest answer to "How to make a sub parser" is using the monadic bind, applicative <*>
, alternative <|>
, and the combinators provided by the library. Assuming that each command belongs to a single type (as in Hans Kruger's answer), and with arbitrary number of columns, the below might make a good template.
import Text.Parsec
import Text.Parsec.Char
import Data.List(transpose)
cmdFileParser :: Parsec s u [[CommandType]]
cmdFileParser = sepBy sepParser cmdLineParser
where
sepParser = newline --From Text.Parsec.Char
cmdLineParser :: Parsec s u [CommandType]
cmdLineParser = sepBy sepParser cmdParser
where
sepParser = tab
cmdParser :: Parsec s u CommandType
cmdParser = parseCommand1
<|> parseCommand2
<|> parseCommand3
<|> etc
Then, after the the parsing, transpose the [[CommandType]]
to group commands by column
main = do
...
let ret = runParser cmdFileParser
"debug string telling what was parsed"
stringToParse
case ret of
Left e -> putStrLn "wasn't parsed"
Right cmds -> doSomethingWith (transpose cmds)
I would say that the above is a typical approach. There are variations of course. For instance if you know there should be only three columns, you might have instead of the above cmdLineParser
the below
cmdLineParser :: Parsec s u (CommandType,CommandType,CommandType)
cmdLineParser = (\a b c -> (a,b,c)) <$> ct <*> ct <*> cmdParser
where
ct = cmdParser <* tab
I would say that using getState
is atypical. When I first started using Parsec, I remember getting something like what I think you are after working, but it wasn't pretty. Of course, if you really want to just return the strings you can always parse for any char except your newlines and tabs.
cmdParser :: Parsec s u String
cmdParser = many (noneOf "\n\t")
Although, careful of using the above. I've been burned in my use of many
before, where it takes too much or always succeeds. So I don't have high confidence that that exact formulation will get you the command string. Also, if you just parse that command as a string, then reparse the command in your main
, you will be parsing twice!
Upvotes: 2
Reputation: 108
Not a complete answer, more a question for clarification:
Is it necessary to build a list of strings? I would prefer to parse the input and convert it into a more special datatype. By that you can use the type guarantees of haskell.
I would begin by defining a datatype for my commands:
data Command = Command1 Argtype1
| Command2 Argtype2
| Command3 Argtype1 Argtype2
data Argtype1 = Arg1 | Arg2 | ArgX
data Argtype2 = Arg2_1 | Arg2_2
After that you can parse the input and put it in datatypes.
At the end of the parsing you can mappend
the results (that is for lists adding at the front with operation (:)).
You end up with a datatype of [Command]. With that you can work further.
For parsing the text you can follow the introduction to the package megaparsec at (https://markkarpov.com/megaparsec/parsing-simple-imperative-language.html)
Or do you mean something completly different? Perhaps that every line (containing some commands) is as it whole shall be one input of a state machine and the state machine changes in relation to the commands? Then I wonder why the state machine shall be implemented as a parser.
Upvotes: 4