Peter.PP
Peter.PP

Reputation: 53

Haskell - Splitting a string by delimiter

I am trying to write a program in Haskell to split a string by delimiter.

And I have studied different examples provided by other users. An example would the the code that is posted below.

split :: String -> [String]
split [] = [""]
split (c:cs)
   | c == ','  = "" : rest
   | otherwise = (c : head rest) : tail rest
 where
   rest = split cs

Sample Input: "1,2,3". Sample Output: ["1","2","3"].

I have been trying to modify the code so that the output would be something like ["1", "," , "2", "," , "3"] which includes the delimiter in the output as well , but I just cannot succeed.

For example, I changed the line:

   | c == ','  = "" : rest

into:

   | c == ','  = "," : rest

But the result becomes ["1,","2,","3"].

What is the problem and in which part I have had a misunderstanding?

Upvotes: 5

Views: 12649

Answers (3)

K. A. Buhr
K. A. Buhr

Reputation: 50819

If you're trying to write this function "for real" instead of writing the character-by-character recursion for practice, I think a clearer method is to use the break function from Data.List. The following expression:

break (==',') str

breaks the string into a tuple (a,b) where the first part consists of the initial "comma-free" part, and the second part is either more string starting with the comma or else empty if there's no more string.

This makes the definition of split clear and straightforward:

split str = case break (==',') str of
                (a, ',':b) -> a : split b
                (a, "")    -> [a]

You can verify that this handles split "" (which returns [""]), so there's no need to treat that as a special case.

This version has the added benefit that the modification to include the delimiter is also easy to understand:

split2 str = case break (==',') str of
                (a, ',':b) -> a : "," : split2 b
                (a, "")    -> [a]

Note that I've written the patterns in these functions in more detail than is necessary to make it absolute clear what's going on, and this also means that Haskell does a duplicate check on each comma. For this reason, some people might prefer:

split str = case break (==',') str of
                (a, _:b) -> a : split b
                (a, _)   -> [a]

or, if they still wanted to document exactly what they were expecting in each case branch:

split str = case break (==',') str of
                (a, _comma:b) -> a : split b
                (a, _empty)   -> [a]

Upvotes: 8

leftaroundabout
leftaroundabout

Reputation: 120711

That example code is poor style. Never use head and tail unless you know exactly what you're doing (these functions are unsafe, partial functions). Also, equality comparisons are usually better written as dedicated patterns.

With that in mind, the example becomes:

split :: String -> [String]
split "" = [""]
split (',':cs) = "" : split cs
split (c:cs) = (c:cellCompletion) : otherCells
 where cellCompletion : otherCells = split cs

(Strictly speaking, this is still unsafe because the match cellCompletion:otherCells is non-exhaustive, but at least it happens in a well-defined place which will give a clear error message if anything goes wrong.)

Now IMO, this makes it quite a bit clearer what's actually going on here: with "" : split cs, the intend is not really to add an empty cell to the result. Rather, it is to add a cell which will be filled up by calls further up in the recursion stack. This happens because those calls deconstruct the deeper result again, with the pattern match cellCompletion : otherCells = split cs, i.e. they pop off the first cell again and prepend the actual cell contents.

So, if you change that to "," : split, the effect is just that all cells you build will already be pre-terminated with a , character. That's not what you want.

Instead you want to add an additional cell that won't be touched anymore. That needs to be deeper in the result then:

split (',':cs) = "" : "," : split cs

Upvotes: 5

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 476547

Instead of altering code in the hope that it matches the expecations, it is usually better to understand the code fragment first.

split :: String -> [String]
split [] = [""]
split (c:cs) | c == ','  = "" : rest
             | otherwise = (c : head rest) : tail rest
    where rest = split cs

First of all we better analyze what split does. The first statement simply says "The split of an empty string, is a list with one element, the empty string". This seems reasonable. Now the second clause states: "In case the head of the string is a comma, we produce a list where the first element is an empty string, followed by splitting up the remainings of the string.". The last guard says "In case the first character of the string is not a comma, we prepend that character to the first item of the split of the remaining string, followed by the remaining elements of the split of the remaining string". Mind that split returns a list of strings, so the head rest is a string.

So if we want to add the delimiter to the output, then we need to add that as a separate string in the output of split. Where? In the first guard. We should not return "," : rest, since the head is - by recursion - prepended, but as a separate string. So the result is:

split :: String -> [String]
split [] = [""]
split (c:cs) | c == ','  = "" : "," : rest
             | otherwise = (c : head rest) : tail rest
    where rest = split cs

Upvotes: 5

Related Questions