turtle
turtle

Reputation: 8093

What approch for simple text processing in Haskell?

I am trying to do some simple text processing in Haskell, and I am wondering what might me the best way to go about this in an FP language. I looked at the parsec module, but this seems much more sophisticated than I am looking for as a new Haskeller. What would be the best way to strip all the punctuation from a corpus of text? My naive approach was to make a function like this:

removePunc str = [c | c <- str, c /= '.',
                                 c /= '?',
                                 c /= '.',
                                 c /= '!',
                                 c /= '-',
                                 c /= ';',
                                 c /= '\'',
                                 c /= '\"',]

Upvotes: 8

Views: 1339

Answers (3)

Daniel
Daniel

Reputation: 27629

You can group your characters in a String and use notElem:

[c | c <- str, c `notElem` ".?!,-;"]

or in a more functional style:

filter (\c -> c `notElem` ".?!,") str

Upvotes: 4

huon
huon

Reputation: 102216

A possibly more efficient method (O(log n) rather than O(n)), is to use a Set (from Data.Set):

import qualified Data.Set as S

punctuation = S.fromList ",?,-;'\""

removePunc = filter (`S.notMember` punctuation)

You must construct the set outside the function, so that it is only computed once (by being shared across all calls), since the overhead of creating the set is much larger than the simple linear-time notElem test others have suggested.

Note: this is such a small situation that the extra overhead of a Set might outweight the asymptotic benefits of the set versus the list, so if one is looking for absolute performance this must be profiled.

Upvotes: 11

Ronson
Ronson

Reputation: 479

You can simply write your code:

removePunc = filter (`notElem` ".?!-;\'\"")

or

removePunc = filter (flip notElem ".?!-;\'\"")

Upvotes: 8

Related Questions