Abraham P
Abraham P

Reputation: 15471

Haskell Regular Expression behaving differently:

Say I have the following regex to validate dates are of the format 'YYYY-MM-DD':

([12]\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01]))

Using any number of online regex testers. (I like https://regex101.com/), it becomes obvious that this works (e.g: 2055-12-12 parses correctly, 2055-13-25 fails to parse.

Now in haskell:

  import Text.Regex.TDFA
  import Text.Regex.TDFA.Text ()
  let dateRegex = "([12]\\d{3}-(0[1-9]|1[0-2])-(0[1-9]|[12]\\d|3[01]))" :: Text
  let fromDate' = "2055-12-12"
  putStrLn((fromDate' =~ dateRegex)::Text)

produces the empty string - failing to match. I have no idea why. Any help would be appreciated.

Upvotes: 1

Views: 98

Answers (2)

James Brock
James Brock

Reputation: 3426

In Haskell, the usual advice is Parse, don't validate.

import Text.Megaparsec
import Text.Megaparsec.Char
import Data.Time.Calendar
import Control.Monad
import Control.Monad.Fail
import Text.Read
import Data.Maybe
import Data.Void

parseDay :: Parsec Void String Day
parseDay = do
  y <- maybe (fail "bad year") pure =<< readMaybe <$> replicateM 4 digitChar
  chunk "-"
  m <- maybe (fail "bad month") pure =<< readMaybe <$> replicateM 2 digitChar
  chunk "-"
  d <- maybe (fail "bad day") pure =<< readMaybe <$> replicateM 2 digitChar
  maybe (fail "not a Gregorian calendar day") pure $ fromGregorianValid y m d
 
runParser parseDay "" "2020-11-30"
Right 2020-11-30

Upvotes: 0

moonGoose
moonGoose

Reputation: 1510

\d meaning digit is a feature of perl-style regular expressions. The docs for regex-tdfa state that it implements posix extended, not perl-style. Your choices are therefore to rewrite with posix character classes ie. using [[:digit:]],

dateRegex = "([12][[:digit:]]{3}-(0[1-9]|1[0-2])-(0[1-9]|[12][[:digit:]]|3[01]))"

or instead import "regex-pcre" Text.Regex.PCRE with your original regex.

Upvotes: 2

Related Questions