Reputation: 53
I am trying to write a program in Haskell to split a string by delimiter.
And I have studied different examples provided by other users. An example would the the code that is posted below.
split :: String -> [String]
split [] = [""]
split (c:cs)
| c == ',' = "" : rest
| otherwise = (c : head rest) : tail rest
where
rest = split cs
Sample Input: "1,2,3"
.
Sample Output: ["1","2","3"]
.
I have been trying to modify the code so that the output would be something like ["1", "," , "2", "," , "3"]
which includes the delimiter in the output as well , but I just cannot succeed.
For example, I changed the line:
| c == ',' = "" : rest
into:
| c == ',' = "," : rest
But the result becomes ["1,","2,","3"]
.
What is the problem and in which part I have had a misunderstanding?
Upvotes: 5
Views: 12649
Reputation: 50819
If you're trying to write this function "for real" instead of writing the character-by-character recursion for practice, I think a clearer method is to use the break
function from Data.List
. The following expression:
break (==',') str
breaks the string into a tuple (a,b)
where the first part consists of the initial "comma-free" part, and the second part is either more string starting with the comma or else empty if there's no more string.
This makes the definition of split
clear and straightforward:
split str = case break (==',') str of
(a, ',':b) -> a : split b
(a, "") -> [a]
You can verify that this handles split ""
(which returns [""]
), so there's no need to treat that as a special case.
This version has the added benefit that the modification to include the delimiter is also easy to understand:
split2 str = case break (==',') str of
(a, ',':b) -> a : "," : split2 b
(a, "") -> [a]
Note that I've written the patterns in these functions in more detail than is necessary to make it absolute clear what's going on, and this also means that Haskell does a duplicate check on each comma. For this reason, some people might prefer:
split str = case break (==',') str of
(a, _:b) -> a : split b
(a, _) -> [a]
or, if they still wanted to document exactly what they were expecting in each case branch:
split str = case break (==',') str of
(a, _comma:b) -> a : split b
(a, _empty) -> [a]
Upvotes: 8
Reputation: 120711
That example code is poor style. Never use head
and tail
unless you know exactly what you're doing (these functions are unsafe, partial functions). Also, equality comparisons are usually better written as dedicated patterns.
With that in mind, the example becomes:
split :: String -> [String]
split "" = [""]
split (',':cs) = "" : split cs
split (c:cs) = (c:cellCompletion) : otherCells
where cellCompletion : otherCells = split cs
(Strictly speaking, this is still unsafe because the match cellCompletion:otherCells
is non-exhaustive, but at least it happens in a well-defined place which will give a clear error message if anything goes wrong.)
Now IMO, this makes it quite a bit clearer what's actually going on here: with "" : split cs
, the intend is not really to add an empty cell to the result. Rather, it is to add a cell which will be filled up by calls further up in the recursion stack. This happens because those calls deconstruct the deeper result again, with the pattern match cellCompletion : otherCells = split cs
, i.e. they pop off the first cell again and prepend the actual cell contents.
So, if you change that to "," : split
, the effect is just that all cells you build will already be pre-terminated with a ,
character. That's not what you want.
Instead you want to add an additional cell that won't be touched anymore. That needs to be deeper in the result then:
split (',':cs) = "" : "," : split cs
Upvotes: 5
Reputation: 476547
Instead of altering code in the hope that it matches the expecations, it is usually better to understand the code fragment first.
split :: String -> [String]
split [] = [""]
split (c:cs) | c == ',' = "" : rest
| otherwise = (c : head rest) : tail rest
where rest = split cs
First of all we better analyze what split
does. The first statement simply says "The split of an empty string, is a list with one element, the empty string". This seems reasonable. Now the second clause states: "In case the head of the string is a comma, we produce a list where the first element is an empty string, followed by splitting up the remainings of the string.". The last guard says "In case the first character of the string is not a comma, we prepend that character to the first item of the split of the remaining string, followed by the remaining elements of the split of the remaining string". Mind that split
returns a list of strings, so the head rest
is a string.
So if we want to add the delimiter to the output, then we need to add that as a separate string in the output of split
. Where? In the first guard. We should not return "," : rest
, since the head is - by recursion - prepended, but as a separate string. So the result is:
split :: String -> [String]
split [] = [""]
split (c:cs) | c == ',' = "" : "," : rest
| otherwise = (c : head rest) : tail rest
where rest = split cs
Upvotes: 5