Removing whitespace from a string and putting each word separated in a list, haskell

Question

With thanks to Remove white space from string, I can successfully remove the whitespace in a string, but in my case, I also need to separate the words and put them all in a list like the following example.

Input

" A String with many spaces."

would output

["A","String","with","many","spaces."]

I am able to output this

["","A","","","","String","with","many"]

with the following code

> splitWords :: String -> [String]
> splitWords [] =[]
> splitWords as =splitWord "" as


> splitWord _ []  = []
> splitWord word ('
':as)   = word : splitWord "" as
> splitWord word ('	':as)  = word : splitWord "" as
> splitWord word (' ':as)  = word : splitWord "" as
> splitWord word (a:as) = splitWord (word ++ [a]) as

Since I'm trying to learn haskell, solutions without using other libraries would be ideal!

Thomas Francois · Accepted Answer

Do you need to do it yourself? If not, use Data.String.words.

λ words " A 	 String with many
spaces."
["A","String","with","many","spaces."] :: [String]

words is defined by:

words   :: String -> [String]
words s =  case dropWhile Char.isSpace s of
                  "" -> []
                  s' -> w : words s''
                        where (w, s'') = break Char.isSpace s'

Edit: not using Data.String functions.

You were not too far off.

First, you are missing the last word in your output. You can solve that by changing the line splitWord _ [] = [] to splitWord word [] = [word].

The next issue is the empty strings that are added to the list. You need to filter them out (I made a top-level function to demonstrate):

addIfNotEmpty :: String -> [String] -> [String]
addIfNotEmpty s l = if s == "" then l else s:l

Using this function:

splitWord word []  = addIfNotEmpty word []
splitWord word ('
':as)   = addIfNotEmpty word $ splitWord "" as
splitWord word ('	':as)  = addIfNotEmpty word $ splitWord "" as
splitWord word (' ':as)  = addIfNotEmpty word $ splitWord "" as
splitWord word (a:as) = splitWord (word ++ [a]) as

And tadaa! It works. But wait, we are not done!

Tidying up

Let's start by splitWords. Not much to do here, but we can use eta-reduction:

splitWords :: String -> [String]
splitWords = splitWord ""

Next, notice that for each type of space, the action is the same. Let's remove the repetition:

splitWord word (c:cs)
    | c `elem` " 	
" = addIfNotEmpty word $ splitWord "" cs
    | otherwise        = splitWord (word ++ [c]) cs

I used elem here to check if the next character is a space, there are arguably better ways to do it.

Final result:

splitWords :: String -> [String]
splitWords = splitWord ""

splitWord :: String -> String -> [String]
splitWord word [] = addIfNotEmpty word []
splitWord word (c:cs)
    | c `elem` " 	
" = addIfNotEmpty word $ splitWord "" cs
    | otherwise        = splitWord (word ++ [c]) cs

addIfNotEmpty :: String -> [String] -> [String]
addIfNotEmpty s l = if s == "" then l else s:l

Removing whitespace from a string and putting each word separated in a list, haskell

Answers (2)

Related Questions