How to use map with a regular expression in Haskell

While trying to apply the regular expression below to each element in a list (test), I get an error. This is what I am trying to do in ghci

import Text.Regex.TDFA
let test = ["<lobbying_firm>The CrisCom Company</lobbying_firm>","<registration_year>2013</registration_year>"]
let regExp = "\\>(.)*\\<"
let result = map (=~ regExp :: String) test

How can I do this? Any ideas?

Upvotes: 3

Views: 94

Answers (1)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476493

You are quite close. The only problem is that your type signature will generate some trouble here. The type of (=~ rexExp) you want is probably String -> String. Indeed the type of the mapping function is a function that takes a String as parameter, and returns a String here. Not a String itself.

We can thus create a map with:

result = map ((=~ regExp) :: String -> String) test

This produces:

Prelude Text.Regex.TDFA> map ((=~ regExp) :: String -> String) test
[">The CrisCom Company</",">2013</"]

That being said, I strongly advice not to parse HTML, XML, JSON, etc. with a regex. Indeed regexes can not parse HTML and other recursive languages. This is the consequence of the Pumping lemma for regular languages [wiki]. You can never fully parse HTML. You might indeed parse some sublanguages, etc. But even then the regex will easily get (very) complicated. You therefore better use a library like tagsoup [hackage], or a scraper library like scalpel [hackage].

Upvotes: 2

Related Questions