dijam
dijam

Reputation: 668

Haskell recursive function which splits a string of words into individual string

I am trying to learn haskell I am stuck on the following question

Write down a function split :: RawText -> [Word] that splits a given string into a lists of words.

To keep things simple, we assume that the raw text is a string of letters and blanks only, possibly with an irregular spacing. For example: split " one two three ff " = ["one","two","three","ff"]

I keep getting ["one"*** Exception: recursion.hs:30:1-14: Non-exhaustive patterns in function spilt' so the recursion is never ending however I have a base case.

My function

type RawText = String
-- [Char] 'a','b','c','d','e'
type SingleWord = String
type Line = [SingleWord]
type Page = [Line]

--        [-------CHAR-----]
-- split " one two three ff " = ["one","two","three","ff"]
--                              [char],[char],[char],[char]
-- " " is delimiter for a []

split' :: RawText -> [SingleWord]
spilt' [] = []
split' xs
    | xs == [] = []
    | isBlank (head xs) = split' xs
    | otherwise = waitForBlank xs : spilt' (tail xs)

isBlank :: Char -> Bool
isBlank x = if x == ' ' then True else False

waitForBlank :: String -> String
waitForBlank [] = []
waitForBlank (x:xs)
    | isBlank x = []
    | otherwise = x : waitForBlank xs

Upvotes: 0

Views: 2680

Answers (2)

ThreeFx
ThreeFx

Reputation: 7350

You declare a function split' and a function spilt'. Correct the typos and you should be good to go!

split' :: RawText -> [SingleWord]
spilt' [] = [] -- the first typo is here
split' xs
    | xs == [] = []
    | isBlank (head xs) = split' xs
    | otherwise = waitForBlank xs : spilt' (tail xs) -- here is the second typo

To explain the message, it is best to split your code into the two different functions you declared:

split' :: RawText -> [SingleWord]
split' xs
    | xs == [] = []
    | isBlank (head xs) = split' xs
    | otherwise = waitForBlank xs : spilt' (tail xs)

-- spilt' :: [a] -> [a] (or something like this)
spilt' [] = []

Now when you recurse in split', you call spilt'. Since the function spilt' you declared doesn't handle the non-empty list, it throws an exception.


On a further note, if you correct the typo, you won't have to handle the empty list twice:

import Prelude hiding (split)

split :: RawText -> [SingleWord]
split [] = []
split xs
    -- | xs == [] = [] this case is matched by the pattern
    --                 above and thus not needed
    | isBlank (head xs) = split' xs
    | otherwise = waitForBlank xs : split (tail xs)

Also, when pattern matching on lists, you can explicitly pattern match against the head and tail of the list by explicitly writing out the fist cons application:

split s@(x:xs)
    | isBlank x = split' xs
    | otherwise = waitForBlank s : split' xs

But somehow this function still seems kind of clunky, there must be a better way to skip all letters. Let's have a look at the library functions to see what could help us. You can find them here:

-- drops letters from RawText while a certain predicate holds
dropWhile :: (a -> Bool) -> [a] -> [a]
-- takes letters form RawText while a certain predicate holds
takeWhile :: (a -> Bool) -> [a] -> [a]

These look rather promising. Now we can rewrite waitForBlank as:

waitForBlank xs = takeWhile (not . isBlank) xs
-- or, pointfree:
waitForBlank = takeWhile (not . isBlank)

Using dropWhile we can also make the split function more efficient by skipping all non-space characters. Note that the space is included in the part of dropWhile and has to be dropped explicitly.

split :: RawText -> [SingleWord]
split [] = []
split xs = waitForBlank xs : split (dropUntilBlank xs)

-- With dropUntilBlank defined as
dropUntilBlank xs = tail (dropWhile (not . isBlank) xs)

-- without the call of tail, dropUntilBlank would keep the space in between the words:
dropWhile (not . isBlank) "a word" => " word"
--              note the extra space: ^^^

-- using tail on this now produces the correct word:
tail (dropWhile (not . isBlank) "a word") = "word"

Now the result looks nice and clean:

split :: RawText -> [SingleWord]
split [] = []
split xs = waitForBlank xs : split (dropUntilBlank xs)

waitForBlank xs = takeWhile (not . isBlank) xs

dropUntilBlank xs = tail (dropWhile (not . isBlank) xs)

The last part can also be combined by using break:

split :: RawText -> [SingleWord]
split [] = []
split xs = word : split' rest
     where
         (word, (space:rest)) = break isBlank xs

This answer assumes that all words are separated by a single space. For multiple spaces, dropUntilBlank has to be rewritten to exclude multiple blanks by substituting tail for dropWhile isBlank.

Upvotes: 6

dijam
dijam

Reputation: 668

My solution (feedback welcome,newbie here)

split' :: RawText -> [SingleWord]
split' [] = []
split' (x:xs)
    | isBlank x = split' xs
    | otherwise = waitForBlank (x:xs) : split' (drop (length (waitForBlank (x:xs))) xs)

isBlank :: Char -> Bool
isBlank x = if x == ' ' then True else False

waitForBlank :: String -> String
waitForBlank [] = []
waitForBlank (x:xs)
    | isBlank x = []
    | otherwise = x : waitForBlank xs

Upvotes: 0

Related Questions