Reputation: 8093
I am trying to do some simple text processing in Haskell, and I am wondering what might me the best way to go about this in an FP language. I looked at the parsec module, but this seems much more sophisticated than I am looking for as a new Haskeller. What would be the best way to strip all the punctuation from a corpus of text? My naive approach was to make a function like this:
removePunc str = [c | c <- str, c /= '.',
c /= '?',
c /= '.',
c /= '!',
c /= '-',
c /= ';',
c /= '\'',
c /= '\"',]
Upvotes: 8
Views: 1339
Reputation: 27629
You can group your characters in a String and use notElem:
[c | c <- str, c `notElem` ".?!,-;"]
or in a more functional style:
filter (\c -> c `notElem` ".?!,") str
Upvotes: 4
Reputation: 102216
A possibly more efficient method (O(log n) rather than O(n)), is to use a Set
(from Data.Set):
import qualified Data.Set as S
punctuation = S.fromList ",?,-;'\""
removePunc = filter (`S.notMember` punctuation)
You must construct the set outside the function, so that it is only computed once (by being shared across all calls), since the overhead of creating the set is much larger than the simple linear-time notElem
test others have suggested.
Note: this is such a small situation that the extra overhead of a Set
might outweight the asymptotic benefits of the set versus the list, so if one is looking for absolute performance this must be profiled.
Upvotes: 11
Reputation: 479
You can simply write your code:
removePunc = filter (`notElem` ".?!-;\'\"")
or
removePunc = filter (flip notElem ".?!-;\'\"")
Upvotes: 8