Damian Nadales
Damian Nadales

Reputation: 5037

Parse a sub-string with parsec (by ignoring unmatched prefixes)

I would like to extract the repository name from the first line of git remote -v, which is usually of the form:

origin git@github.com:some-user/some-repo.git (fetch)

I quickly made the following parser using parsec:

-- | Parse the repository name from the output given by the first line of `git remote -v`.
repoNameFromRemoteP :: Parser String
repoNameFromRemoteP = do
    _ <- originPart >> hostPart
    _ <- char ':'
    firstPart <- many1 alphaNum
    _ <- char '/'
    secondPart <- many1 alphaNum
    _ <- string ".git"
    return $ firstPart ++ "/" ++ secondPart
    where
      originPart = many1 alphaNum >> space
      hostPart =  many1 alphaNum
               >> (string "@" <|> string "://")
               >> many1 alphaNum `sepBy` char '.'

But this parser looks a bit awkward. Actually I'm only interested in whatever follows the colon (":"), and it would be easier if I could just write a parser for it.

Is there a way to have parsec skip a character upon a failed match, and re-try from the next position?

Upvotes: 4

Views: 533

Answers (2)

James Brock
James Brock

Reputation: 3426

The sepCap combinator from replace-megaparsec can skip a character upon a failed match, and re-try from the next position.

Maybe this is overkill for your particular case, but it does solve the general problem.

import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Maybe
import Data.Either

username :: Parsec Void String String
username = do
    void $ single ':'
    some $ alphaNumChar <|> single '-'

listToMaybe . rights =<< parseMaybe (sepCap username)
    "origin git@github.com:some-user/some-repo.git (fetch)"
Just "some-user"

Upvotes: 2

Daniel Wagner
Daniel Wagner

Reputation: 153247

If I've understood the question, try many (noneOf ":"). This will consume any character until it sees a ':', then stop.

Edit: Seems I had not understood the question. You can use the try combinator to turn a parser which may consume some characters before failing into one that consumes no characters on a failure. So:

skipUntil p = try p <|> (anyChar >> skipUntil p)

Beware that this can be quite expensive, both in runtime (because it will try matching p at every position) and memory (because try prevents p from consuming characters and so the input cannot be garbage collected at all until p completes). You might be able to alleviate the first of those two problems by parameterizing the anyChar bit so that the caller could choose some cheap parser for finding candidate positions; e.g.

skipUntil p skipper = try p <|> (skipper >> skipUntil p skipper)

You could then potentially use the above many (noneOf ":") construction to only try p on positions that start with a :.

Upvotes: 4

Related Questions