Reputation: 22530
I wanted to adapt the python regex (PCRE) technique in this SO question Find string between two substrings to Haskell so that I can do the same in Haskell.
But I can't figure out how to make it work in GHC (8.2.1). I've installed cabal install regex-pcre
, and came up with the following test code after some search:
import Text.Regex.PCRE
s = "+++asdf=5;iwantthis123jasd---"
result = (s ++ s) =~ "asdf=5;(.*)123jasd" :: [[String]]
I was hoping to get the first and last instance of the middle string
iwantthis
But I can't get the result right:
[["asdf=5;iwantthis123jasd---+++asdf=5;iwantthis123jasd","iwantthis123jasd---+++asdf=5;iwantthis"]]
I haven't used regex or pcre in Haskell before.
Can someone help with the right usage (to extract the first and last occurrence) ?
Also, I don't quite understand the ::[[String]]
usage here. What does it do and why is it necessary?
I searched the documentation but found no mention of the usage with type conversion to :: [[String]]
.
Upvotes: 1
Views: 677
Reputation: 477265
The result you obtain is the following:
Prelude Text.Regex.PCRE> (s ++ s) =~ "asdf=5;(.*)123jasd" :: [[String]]
[["asdf=5;iwantthis123jasd---+++asdf=5;iwantthis123jasd","iwantthis123jasd---+++asdf=5;iwantthis"]]
This is correct, the first element is the implicit capture group 0 (the entire regex), and the second element is that of capture group 1 (the one that matches (.*)
. Since it matches like:
+++asdf=5;iwantthis123jasd---+++asdf=5;iwantthis123jasd---
So it still matches between the asdf=5;
and 123jasd
part.
This is due to the fact that the Kleene start *
matches greedy: it aims to capture as much as possible. You can use (.*?)
however to use a non-greedy quantifier:
Prelude Text.Regex.PCRE> (s ++ s) =~ "asdf=5;(.*?)123jasd" :: [[String]]
[["asdf=5;iwantthis123jasd","iwantthis"],["asdf=5;iwantthis123jasd","iwantthis"]]
And now we obtain two matches. Each match has "iwantthis"
as capture group 1.
You can use map (head . tail)
or map (!!1)
on it to obtain a list of captures of the (.*?)
part:
Prelude Text.Regex.PCRE> map (!!1) ((s ++ s) =~ "asdf=5;(.*?)123jasd" :: [[String]])
["iwantthis","iwantthis"]
Upvotes: 4