Reputation: 95
I require to extract information from a list of strings (obtained from file) and return the result as a list of matched lines. The function prototype I made is as follows:
extractRegions :: [String] -> [String]
extractRegions list = undefined -- not sure about definition
I understand that the Text.Regex.Posix
library is recommended for doing this, but I can't find any information on using it within a Haskell file context, and the API library is confusing. I would like to extract certain regions that begin with some word and end in another, e.g. begins with "Start" and ends with "Finish", with text in-between that is also matched with this regular expression.
How should I address this simple idea in Haskell?
Many thanks
Upvotes: 1
Views: 478
Reputation: 74374
The regex-compat package is significantly easier to start with. Text.Regex.Posix
may be the weapon you turn to eventually, but it has a more confusing interface due to its very general overloading of (=~)
.
Above that, the "Haskelly" way to handle this kind of problem is to create a type that represents the information in each line of your argument to extractRegions
(let's call it Line
for argument's sake) and then create a parser
data Line = Line { ..., region :: Region, ... }
data Region = Region { ... }
parseLine :: String -> Maybe Line
using a library like Parsec
or Attoparsec
. From there, we can extract the information we need from the Region
type very easily by using the region
record accessor function.
map record :: [Line] -> [Region]
and then combine these pieces to get the complete picture
extractRegions :: [String] -> [Region]
extractRegions input = case sequence (map parseLine input) of
Nothing -> error "One of our line parses failed!"
Just lines -> map record lines
Upvotes: 1