Reputation: 1609

Haskell grammar to validate a string in specific format

I would like to define a grammar in Haskell that matches a string in format "XY12XY" (some alpha followed by some numerics), eg variable names in programming languages.

customer123 is a valid variable name, but '123customer' is not a valid variable name.

I am at a loss how to define the grammar and write a validator function that would validate whether a given string is valid variable name. I have been trying to understand and adapt the parser example at: https://wiki.haskell.org/GADT but I just can't get my head around how to tweak it to make it work for my need.

If any kind fellow Haskell gurus would help me define this please:

validate :: ValidFormat -> String -> Bool
validate f [] = False
validate f s = ...

I would like to define the ValidFormat grammar as:

varNameFormat = Concat Alpha $ Concat Alpha Numeric

Upvotes: 1

Answers (2)

Oleg Tsybulskyi

Reputation: 23

I've taken this from examples of regex-applicative

import Text.Regex.Applicative
import Data.Char
import Data.Maybe

varNameFormat :: RE Char String
varNameFormat = (:) <$> psym isAlpha <*> many (psym isAlphaNum)

validate :: RE Char String -> String -> Bool
validate re str = isJust $ str =~ re

You will have

*Main> validate varNameFormat "a123"
True
*Main> validate varNameFormat "1a23"
False

Upvotes: 0

hasufell

Reputation: 573

I'd start with a simple parser and see if that satisfies your needs, unless you can explain why this is not enough for your use case. Parsers are pretty straightforward. I'll give a very simple (and maybe incomplete) example with attoparsec:

import Control.Applicative
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString.Char8 as B


validateVar :: B.ByteString -> Bool
validateVar bstr = case parseOnly variableP bstr of
  Right _ -> True
  Left  _ -> False

variableP :: Parser String
variableP =
  (++)
  <$> many1 letter_ascii            -- must start with one or more letters
  <*> many (digit <|> letter_ascii) -- then can have any combination of letters/digits
  <* endOfInput                     -- make sure we don't ignore invalid trailing chars

variableP combines parsers via <*> and will require you to handle both results of many1 letter_ascii and many (digit <|> letter_ascii). In this case we just concatenate both results via (++), check the types of many1, many, letter_ascii and digit. The <* says "parse this, but discard the result of the right hand parser" (otherwise you'd have to handle 3 results).

That means if you run the parser on "abc123" you'll get back "abc123". If you parse "1abc" the parser will fail.

Check the type of parseOnly:

parseOnly :: Parser a -> ByteString -> Either String a

We pass it our parser and the bytestring it should parse. If the parser fails we'll get Left <something went wrong>. If the parser succeeds, we'll get Right <our string>. The cool thing is... instead of just giving a string on success, we could do pretty much anything with the results in variableP, as in: use something different than (++), convert the types and whatnot (mind that the Parser type might also have to change then).

Since we only care if the parser succeeded in validateVar, we can just ignore the result in either case.

So instead of defining GADTs for your grammar, you just define Parsers.

You might also find this link useful for a tutorial: http://www.seas.upenn.edu/~cis194/fall14/spring13/lectures.html (week 10 and 11, including the assignments where you basically write your own little parser library)

Upvotes: 1

Haskell grammar to validate a string in specific format

Answers (2)

Related Questions