wang kai
wang kai

Reputation: 1747

How to capture string in haskell regex expression?

With module Text.Regex.Posix,I can check if a string matchs a regex expression,but I don't know how to capture element in the string

For example,I can capture 3 element by fsharpx in this way:

Match @"(?i:MAIL\s+FROM:\s*<([a-zA-Z0-9]+)@([a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+)>\s*(SIZE=([0-9]+))*)" mailMatch -> 

I can catch

([a-zA-Z0-9]+) by mailMatch.Groups.[0].ToString() 
([a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+) by mailMatch.Groups.[1].ToString() 
([0-9]+))* by mailMatch.Groups.[2].ToString() 

but I don't know how to do this in haskell

I need some example,thanks!

Upvotes: 2

Views: 956

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476594

First of all the regex you show is, as far as I know not a POSIX regex. So you should import Text.Regex.PCRE instead of import Text.Regex.Posix, since this is a more extended version of regexes.

Secondly, the regex itself, should escape the backslashes, so you should rewrite:

regex = "(?i:MAIL\s+FROM:\s*<([a-zA-Z0-9]+)@([a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+)>\s*(SIZE=([0-9]+))*)"

into:

regex = "(?i:MAIL\\s+FROM:\\s*<([a-zA-Z0-9]+)@([a-zA-Z0-9]+(\\.[a-zA-Z0-9]+)+)>\\s*(SIZE=([0-9]+))*)"

and now we can use the (=~) operator:

Prelude Text.Regex.PCRE> "MAIL  FROM: <[email protected]> SIZE=1" =~ regex :: [[String]]
[["MAIL  FROM: <[email protected]> SIZE=1","foo","bar.com",".com","SIZE=1","1"]]

We here thus specify that the result is a list of lists of strings [[String]]. Every sublist is a match of the regex. So in case the text occurs three matches, we have three sublists. For every sublist, we see the captures. The first capture is the full match, the second capture is capture group 1, etc.

If you know for sure that there will only be one match, you can for instance use:

[[_,user,domain,topdomain,_,size]] = "MAIL  FROM: <[email protected]> SIZE=1" =~ regex :: [[String]]

Then the result is:

Prelude Text.Regex.PCRE> user
"foo"
Prelude Text.Regex.PCRE> domain
"bar.com"
Prelude Text.Regex.PCRE> topdomain
".com"
Prelude Text.Regex.PCRE> size
"1"

Mind that this kind of pattern matching tends to be unsafe, so you better work with a more safe and total solution in your program.

Upvotes: 5

Related Questions