cheshire
cheshire

Reputation: 1159

haskell beginner strings

I want to write a simple string function:

input:    aaabbbAAAcccc 
output:   a3b3A3c4

All I have now it this code:

compress :: String -> String 
compress  x = if length x > 1 
            then do take 1 x ++ show ( length $ filter (==head x) x) ++ compress (drop 1 x)
            else x 

This will obviously not work. Here is the output:

*Main> compress "aaaaaaaaabbbbbbb"
"a9a8a7a6a5a4a3a2a1b7b6b5b4b3b2b"

I am taking 1 out of the x string applying filter to take out how many "head x" I have in the string and outputing it for every Character and then calling it recursively after I drop 1 out of the x.

I understand that this is wrong approach and that this way I cant write it. But how can I keep track how many equal chars do I have in String and how to compare them recursively? Do I have to use something like:

compress :: String -> String 
compress (x:xs) = 

and then calling xs and somehow compare it to x, or do I have no other choise other than trying to understand Data.List.

Upvotes: 2

Views: 864

Answers (2)

Adam Smith
Adam Smith

Reputation: 54213

group is a natural winner here, taking a string like "Mississippi" and turning it into ["M", "i", "ss", "i", "ss", "i", "pp", "i"], but let's imagine you don't want to do this. How could you implement this with more primitive explicit recursion?

Well you've got takeWhile, to start.

takeWhile :: (a -> Bool) -> [a] -> [a]

Given a predicate and a list, take as many consecutive elements from the left as you can as long as they all satisfy the predicate

takeWhile (>3) [4, 5, 6, 9, 9, 5, 3, 5, 4] = [4, 5, 6, 9, 9, 5]  -- loses [3, 5, 4]

and even better, you can have span, which does all that and gives you back the remainder.

span :: (a -> Bool) -> [a] -> ([a], [a])
span (>3) [4, 5, 6, 9, 9, 5, 3, 5, 4] = ([4, 5, 6, 9, 9, 5], [3, 5, 4])

This should let you pull out each section that has the same letter.

-- here I'm using the @ notation to pattern match on `x` without losing the larger pattern that I'm naming xss
compress (xss@(x:_)) = let (cur, rest) = span (==x) xss

What now? We know both cur and rest will be strings, and that cur is all the consecutive letters from the left that are equal. Well I suppose we can sum them and put one of them up front.

compress (xss@(x:_)) = let (cur, rest) = span (==x) xss
                       in x:(show . length $ cur) ++ ???

What do we do with those question marks? What's the rest of our code? Well we'll have to handle the rest somehow. What do we want to happen with rest? Well we want it to be compressed too. Let's try just recursing here.

compress (xss@(x:_)) = let (cur, rest) = span (==x) xss
                       in x:(show . length $ cur) ++ compress rest

This looks promising, but we're missing the Secret Sauce of recursion -- the base case. What do we do when we run out of elements? We need to do something when we call compress [], and that thing should end up on the end of our chain of (:)s. The natural thing is a [].

compress :: String -> String
compress (xss@(x:_)) = let (cur, rest) = span (==x) xss
                       in x:(show . length $ cur) ++ compress rest
compress []          = []

Upvotes: 4

racherb
racherb

Reputation: 335

A simple solution

A simple way to solve this case would be to apply the following function:

Prelude> let str = "aaabbbAAAcccc"
Prelude> concat $ fmap (\x -> [head x]<>show(length x)) (group str)
Prelude> "a3b3A3c4"

So your compression function would be:

compress :: String -> String
compress str = concat $ fmap (\x -> [head x]<>show(length x)) (group str)

Suggestions

Adopting a more functional way of thinking would help when solving problems with Haskell. If-then-else sentences are rarely seen, the use of lambdas functions is convenient, in many cases, for simplification and readability of the code, function mapping and recursion are a classic of the Haskell world.

How it works The compress function described performs the following operations:

  • Grouping of elements: group str
  • The grouped elements are mapped, one by one, carrying out the lambda calculation that, for each character, computes its quantity: fmap (\x -> [head x]<>show(length x))
  • Finally, the result is concatenated: concat

Why doesn't the original code work?

The code presented in the problem does not work basically because the algorithm enters a recursion that runs through the list of characters and, for each character, counts again and again on the same characters, advancing in the position of the string and without considering that similar characters were partially computed.

On the other hand, the list pattern (x:xs) could have been used properly to scroll through the list. x for the head and xs to get the tail of the list in the recursion.

Upvotes: 4

Related Questions