Reputation: 72
With thanks to Remove white space from string, I can successfully remove the whitespace in a string, but in my case, I also need to separate the words and put them all in a list like the following example.
Input
" A \t String with many\nspaces."
would output
["A","String","with","many","spaces."]
I am able to output this
["","A","","","","String","with","many"]
with the following code
> splitWords :: String -> [String]
> splitWords [] =[]
> splitWords as =splitWord "" as
> splitWord _ [] = []
> splitWord word ('\n':as) = word : splitWord "" as
> splitWord word ('\t':as) = word : splitWord "" as
> splitWord word (' ':as) = word : splitWord "" as
> splitWord word (a:as) = splitWord (word ++ [a]) as
Since I'm trying to learn haskell, solutions without using other libraries would be ideal!
Upvotes: 1
Views: 1635
Reputation: 2983
What we need is a parser. This is simply a function that takes a string as input and returns a data structure as output. I'll show you a simplified way to create a parser in the "combinator" style. What this means is that we'll build the parser we want out of smaller parsers (by combining them).
This isn't the best or most efficient way to do this, but it will demonstrate the technique. And it doesn't require any libraries!
We'll start with a language pragma to decrease some boilerplate:
{-# LANGUAGE DeriveFunctor #-}
Now let's create a data type to represent parsing functions.
data Parser a = P { parser :: String -> Maybe (String, a) } deriving Functor
Basically, the parser is a function underneath the data wrapper. The way it will work is it will take a string as input and if its criteria matches characters at the beginning of the string, then it will consume those characters, create the data of type a
and return a Just
containing the unconsumed input and the new item. However, if the criteria fails, then it just returns Nothing
.
We'll implement Applicative and Monad for our Parser type, then we'll be able to use do notation. This is one of the coolest features of Haskell (IMHO). We won't use the Applicative <*>
, but we need the instance in order to implement Monad. (Although Applicative is awesome in its own right.)
instance Applicative Parser where
pure x = P (\input -> Just (input, x))
f <*> p = do
f' <- f
p' <- p
return $ f' p'
The key operation Monad requires is the bind (>>=
) which takes the result of a first parser and feeds it to a function that returns a second parser. This is the most convenient way to combine parsers. It lets us accumulate (or throw away) results without manually threading the input through the parser functions.
instance Monad Parser where
return = pure
p >>= f = P (\input -> case parse p input of
Just (rest, x) -> parse (f x) rest
_ -> Nothing)
Next we need a way to create a "primitive" parser. We'll make a function that takes a Char
predicate and returns a parser that will accept a single character that passes the predicate.
satisfy :: (Char -> Bool) -> Parser Char
satisfy p = P (\input -> case input of
(x:xs) | p x -> Just (xs, x) -- success!
_ -> Nothing -- failure :(
There are lots of other ways we could manipulate Parsers, but we'll stick to solving the given problem. The next thing we need is a way to repeat a Parser. That's where the while
function comes in handy. It will take a parser that produces items of type a
and repeat it until it fails, accumulating the results in a list.
while :: Parser a -> Parser [a]
while p = P (\input -> case parse p input of
Nothing -> Just (input, [])
Just (rest, x) -> parse (fmap (x:) (while p)) rest)
We're almost done. We'll create the predicates to tell whitespace from non-whitespace.
isWhitespace c = c == ' ' || c == '\t' || c == '\n'
isNotWhiteSpace = not . isWhitespace
Ok, now we'll see how awesome do-notation is. First we create a parser for a single word.
word :: Parser String
word = do
c <- (satisfy isNotWhitespace) -- grab the first character
cs <- while (satisfy isNotWhitespace) -- get any other characters
while (satisfy isWhitespace) -- eat the trailing whitespace
return (c:cs)
We can finally implement the Parser we really want!
splitWords :: Parser [String]
splitWords = do
while (satisfy isWhitespace) -- eat up any leading whitespace
while word
And finally, try it!
main :: IO ()
main = do
let input = " A \t String with many\nspaces."
case parse splitWords input of
Nothing -> putStrLn "failed!"
Just (_, result) -> putStrLn . show $ result
This is what I get in ghci:
λ main
["A","String","with","many","spaces."]
Upvotes: 2
Reputation: 866
Do you need to do it yourself? If not, use Data.String.words
.
λ words " A \t String with many\nspaces."
["A","String","with","many","spaces."] :: [String]
words
is defined by:
words :: String -> [String]
words s = case dropWhile Char.isSpace s of
"" -> []
s' -> w : words s''
where (w, s'') = break Char.isSpace s'
Edit: not using Data.String functions.
You were not too far off.
First, you are missing the last word in your output.
You can solve that by changing the line splitWord _ [] = []
to splitWord word [] = [word]
.
The next issue is the empty strings that are added to the list. You need to filter them out (I made a top-level function to demonstrate):
addIfNotEmpty :: String -> [String] -> [String]
addIfNotEmpty s l = if s == "" then l else s:l
Using this function:
splitWord word [] = addIfNotEmpty word []
splitWord word ('\n':as) = addIfNotEmpty word $ splitWord "" as
splitWord word ('\t':as) = addIfNotEmpty word $ splitWord "" as
splitWord word (' ':as) = addIfNotEmpty word $ splitWord "" as
splitWord word (a:as) = splitWord (word ++ [a]) as
And tadaa! It works. But wait, we are not done!
Tidying up
Let's start by splitWords
. Not much to do here, but we can use eta-reduction:
splitWords :: String -> [String]
splitWords = splitWord ""
Next, notice that for each type of space, the action is the same. Let's remove the repetition:
splitWord word (c:cs)
| c `elem` " \t\n" = addIfNotEmpty word $ splitWord "" cs
| otherwise = splitWord (word ++ [c]) cs
I used elem
here to check if the next character is a space, there are arguably better ways to do it.
Final result:
splitWords :: String -> [String]
splitWords = splitWord ""
splitWord :: String -> String -> [String]
splitWord word [] = addIfNotEmpty word []
splitWord word (c:cs)
| c `elem` " \t\n" = addIfNotEmpty word $ splitWord "" cs
| otherwise = splitWord (word ++ [c]) cs
addIfNotEmpty :: String -> [String] -> [String]
addIfNotEmpty s l = if s == "" then l else s:l
Upvotes: 2