Incerteza
Incerteza

Reputation: 34884

"Split" returns redundant characters

I'm looking for a simple way of implementing split function. Here is what I have:

import Data.List
groupBy (\x y -> y /= ',') "aaa, bbb, ccc, ddd"

=> ["aaa",", bbb",", ccc",", ddd"]

It's almost what I want except the fact that a delimiter "," and even an extra whitespace are in the result set. I'd like it to be ["aaa","bbb","ccc","ddd"]

So what is the simplest way to do that?

Upvotes: 0

Views: 100

Answers (2)

josejuan
josejuan

Reputation: 9566

Think about: what is your group separator?

In your case, looks you want to avoid comma and whitespaces, why not?

split :: Eq a => [a] -> [a] -> [[a]]
split separators seq = ...

You can group then writing

groupBy ((==) `on` (flip elem sep)) seq

taking

[ "aaa"
, ", "
, "bbb"
, ", "
, "ccc"
, ", "
, "ddd"
]

and filter final valid groups

filter (not.flip elem sep.head) $ groupBy ((==) `on` (flip elem sep)) seq

returning

["aaa","bbb","ccc","ddd"]

of course, if you want a implemented function, then Data.List.Split is great!

Explanation

This split function works for any a type whenever instance Eq class (i.e. you can compare equality given two a). Not just Char.

A (list-based) string in Haskell is written as [Char], but a list of chars (not a string) is also written as [Char].

In our split function, the first element list is the valid separators (e.g. for [Char] may be ", "), the second element list is the source list to split (e.g. for [Char] may be "aaa, bbb"). A better signature could be:

type Separators a = [a]

split :: Eq a => Separators a -> [a] -> [[a]]

or data/newtype variations but this is another story.

Then, our first argument has the same type as second one - but they are not the same thing.

The resultant type is a list of strings. As a string is [Char] then the resultant type is [[Char]]. If we'd prefer a general type (not just Char) then it becomes [[a]].

A example of splitting with numbers might be:

Prelude> split [5,10,15] [1..20]
[[1,2,3,4],[6,7,8,9],[11,12,13,14],[16,17,18,19,20]]

[5,10,15] is the separator list, [1..20] the input list to split.

(Thank you very much Nick B!)

Upvotes: 5

ThreeFx
ThreeFx

Reputation: 7350

Have a look at the splitOn function from the Data.List.Split package:

splitOn ", " "aaa, bbb, ccc, ddd" -- returns ["aaa","bbb","ccc","ddd"]

It splits a given list on every occurrence of the complete substring. Alternatively you can also use splitOneOf:

splitOneOf ", " "aaa, bbb, ccc, ddd" -- returns ["aaa","","bbb","","ccc","","ddd"]

Although it returns some empty strings it has the advantage of splitting at one of the characters. The empty strings can be removed by a simple filter.

Upvotes: 5

Related Questions