Reputation: 34884
I'm looking for a simple way of implementing split function. Here is what I have:
import Data.List
groupBy (\x y -> y /= ',') "aaa, bbb, ccc, ddd"
=> ["aaa",", bbb",", ccc",", ddd"]
It's almost what I want except the fact that a delimiter "," and even an extra whitespace are in the result set. I'd like it to be ["aaa","bbb","ccc","ddd"]
So what is the simplest way to do that?
Upvotes: 0
Views: 100
Reputation: 9566
Think about: what is your group separator?
In your case, looks you want to avoid comma and whitespaces, why not?
split :: Eq a => [a] -> [a] -> [[a]]
split separators seq = ...
You can group then writing
groupBy ((==) `on` (flip elem sep)) seq
taking
[ "aaa"
, ", "
, "bbb"
, ", "
, "ccc"
, ", "
, "ddd"
]
and filter final valid groups
filter (not.flip elem sep.head) $ groupBy ((==) `on` (flip elem sep)) seq
returning
["aaa","bbb","ccc","ddd"]
of course, if you want a implemented function, then Data.List.Split is great!
Explanation
This split
function works for any a
type whenever instance Eq
class (i.e. you can compare equality given two a
). Not just Char
.
A (list-based) string in Haskell is written as [Char]
, but a list of chars (not a string) is also written as [Char]
.
In our split
function, the first element list is the valid separators (e.g. for [Char]
may be ", "
), the second element list is the source list to split (e.g. for [Char]
may be "aaa, bbb"
). A better signature could be:
type Separators a = [a]
split :: Eq a => Separators a -> [a] -> [[a]]
or data
/newtype
variations but this is another story.
Then, our first argument has the same type as second one - but they are not the same thing.
The resultant type is a list of strings. As a string is [Char]
then the resultant type is [[Char]]
. If we'd prefer a general type (not just Char
) then it becomes [[a]]
.
A example of splitting with numbers might be:
Prelude> split [5,10,15] [1..20]
[[1,2,3,4],[6,7,8,9],[11,12,13,14],[16,17,18,19,20]]
[5,10,15]
is the separator list, [1..20]
the input list to split.
(Thank you very much Nick B!)
Upvotes: 5
Reputation: 7350
Have a look at the splitOn
function from the Data.List.Split package:
splitOn ", " "aaa, bbb, ccc, ddd" -- returns ["aaa","bbb","ccc","ddd"]
It splits a given list on every occurrence of the complete substring. Alternatively you can also use splitOneOf
:
splitOneOf ", " "aaa, bbb, ccc, ddd" -- returns ["aaa","","bbb","","ccc","","ddd"]
Although it returns some empty strings it has the advantage of splitting at one of the characters. The empty strings can be removed by a simple filter
.
Upvotes: 5