Reputation: 664579
I am trying to split my input into those parts that match a certain pattern and the rest, let's say
data Data = A Int | B Char | C String
parseDatas :: Parsec [Token] () a [Data]
I already have written two more-or-less complicated parsers
parseA :: Parsec [Token] () Data
parseB :: Parsec [Token] () Data
that match the things I am looking for. Now the obvious solution is
parseDatas = many (parseA <|> parseB <|> parseC)
where the parser for the intermediate parts would look like this:
makeC :: [Token] -> Data
makeC = C . concatMap show -- or something like this
parseC :: Parsec [Token] () Data
parseC = makeC <$> many anyToken
Meh, that throws a runtime [ERROR] Text.ParserCombinators.Parsec.Prim.many: combinator 'many' is applied to a parser that accepts an empty string.
- ok, easily fixed:
parseC = makeC <$> many1 anyToken
But now parseC
consumes the entire input (that starts with something I'm not looking for), ignoring any patterns that should yield an A
or B
!
If my patterns were regexes1, I would now have changed the +
operator to the non-greedy +?
operator. How can I do the same for the many1
parser combinator?
1: which I cannot use, as I'm operating on tokens not characters
A solution I found was
parseC = makeC <$> many1 (notFollowedBy (parseA <|> parseB) >> anyToken)
but that does look, uh, suboptimal. It's not really generic. There must be something better.
I also had a look at Parsec how to find "matches" within a string where the suggestion was to define a recursive parser, but that looks like a hazzle if I don't want to drop the intermediate tokens and collect them in a list instead.
Upvotes: 2
Views: 878
Reputation: 3426
Also, for the general question:
…like in regex where we would repeat pattern p
non-greedily by writing p*?
?
The answer is:
manyTill_ p q
where q
is the entire rest of the parser.Or, for p+?
, it would be someTill_ p q
where q
is the entire rest of the parser.
Upvotes: 1
Reputation: 3426
The
sepCap
parser combinator from
replace-megaparsec
can split a string into those parts that match a certain pattern and the rest.
Try this:
sepCap (parseA <|> parseB)
Or, if parseA
and parseB
are parsers for different types of things, then maybe use eitherP
like:
sepCap (eitherP parseA parseB)
Upvotes: 1
Reputation: 2317
You could let parseC consume exactly one token at a time:
parseDatas = many $ parseA <|> parseB <|> (C . show <$> anyToken)
and then, if you want, group adjacent C
s into one to conserve semantics:
groupCs (C c) (C c':xs) = C (c ++ c') : xs
groupCs x xs = x : xs
parseDatas = foldr groupCs [] <$> many (parseA <|> parseB <|> (C . show <$> anyToken))
If you want to apply some operation make :: [Token] -> String
on consecutive C
s:
data Data c = A Int | B Char | C c deriving Functor
groupCs :: [Data a] -> [Data [a]] -> [Data [a]]
groupCs (C c) (C cs:xs) = C (c:cs) : xs
groupCs (C c) xs = C [c] : xs
groupCs x xs = x : xs
parseDatas = (map.fmap) make . foldr groupCs [] <$> many (parseA <|> parseB <|> (C <$> anyToken))
Upvotes: 3