user1272525
user1272525

Reputation: 95

Extracting regions of text from list of strings with Haskell

I require to extract information from a list of strings (obtained from file) and return the result as a list of matched lines. The function prototype I made is as follows:

extractRegions :: [String] -> [String]
extractRegions list = undefined -- not sure about definition

I understand that the Text.Regex.Posix library is recommended for doing this, but I can't find any information on using it within a Haskell file context, and the API library is confusing. I would like to extract certain regions that begin with some word and end in another, e.g. begins with "Start" and ends with "Finish", with text in-between that is also matched with this regular expression.

How should I address this simple idea in Haskell?

Many thanks

Upvotes: 1

Views: 478

Answers (1)

J. Abrahamson
J. Abrahamson

Reputation: 74374

The regex-compat package is significantly easier to start with. Text.Regex.Posix may be the weapon you turn to eventually, but it has a more confusing interface due to its very general overloading of (=~).

Above that, the "Haskelly" way to handle this kind of problem is to create a type that represents the information in each line of your argument to extractRegions (let's call it Line for argument's sake) and then create a parser

data Line   = Line   { ..., region :: Region, ... }
data Region = Region { ... }

parseLine :: String -> Maybe Line

using a library like Parsec or Attoparsec. From there, we can extract the information we need from the Region type very easily by using the region record accessor function.

map record :: [Line] -> [Region]

and then combine these pieces to get the complete picture

extractRegions :: [String] -> [Region]
extractRegions input = case sequence (map parseLine input) of
  Nothing    -> error "One of our line parses failed!"
  Just lines -> map record lines

Upvotes: 1

Related Questions