user1522145
user1522145

Reputation: 11

Parsing out words in a string

I hope I was clear about my question!

Any help would be appreciated!

Upvotes: 0

Views: 243

Answers (3)

dave4420
dave4420

Reputation: 47042

As you are learning, here's how to do it from scratch.

import qualified Data.Set as S

First, the set of word boundaries:

wordBoundaries :: S.Set Char
wordBoundaries = S.fromList " ."

(Data.Set.fromList takes a list of elements; [Char] is the same as String, which is why we can pass a string in this case.)

Next, splitting a string into words:

toWords :: String -> [String]
toWords = fst . foldr cons ([], True)
  where

The documentation for fst and foldr is pretty clear, but that for . is a bit terse if you've not encountered function composition before.

The argument given to toWords is fed to the foldr cons ([], True). . then takes the result from foldr cons ([], True) and feeds it to fst. Finally, the result from fst is used as the result from toWords itself.

We have still to define cons:

    cons :: Char -> ([String], Bool) -> ([String], Bool)
    cons ch (words, startNew)
        | S.member ch wordBoundaries = (              words, True)
        | startNew                   = ([ch]        : words, False)
    cons ch (word : words, _)        = ((ch : word) : words, False)

Homework: work out what cons does and how it works. This may be easier if you first ensure you understand how foldr calls it.

Upvotes: 0

Gabriella Gonzalez
Gabriella Gonzalez

Reputation: 35089

You want Data.List.Split, which covers the vast majority of splitting use cases.

For your example, just use:

splitOneOf ".,!?"

And if you want to get rid of the "empty words" between consecutive delimiters, just use:

filter (not . null) . splitOneOf ".,!?"

If you want those delimiters to come from set that you already stored them in, then just use:

import qualified Data.Set as S

s :: S.Set Char

split = filter (not . null) . splitOneOf (S.toList s)

Upvotes: 1

jtobin
jtobin

Reputation: 3273

The function words from the Prelude will filter out spaces for you (a good way to find functions by desired type is Hoogle).

Prelude> :t words
words :: String -> [String]

You just need to compose this with an appropriate filter that makes use of Set. Here's a really basic one:

import Data.Set (Set, fromList, notMember)

parser :: String -> [String]
parser = words . filter (`notMember` delims)
   where delims = fromList ".,!?"

parser "yeah. what?" Will return ["yeah", "what"].

Check out Learn You A Haskell for some good introductory material.

Upvotes: 2

Related Questions