Jace Ziegler
Jace Ziegler

Reputation: 1

In ML How do you keep the delimiter in the output of the String.tokens function?

Here is a basic implementation of the tokens function:

fun f c = c = #" ";
val testStr = "int main(){return 42;}";
val stringL = String.tokens f testStr;

It returns:

val stringL = ["int","main(){return","42;}"] : string list

How do I get it to return:

val stringL = ["int"," ","main(){return"," ","42;}"] : string list

I tried to pipe forward the value so that it would return the " " as well. But I only got errors.

Upvotes: 0

Views: 60

Answers (1)

Chris
Chris

Reputation: 36620

Well, knowing that spaces occur between each element in the list, you can add them back later.

fun intersperse(_, []) = []
  | intersperse(_, lst as [_]) = lst
  | intersperse(v, x::xs) = 
      x :: v :: intersperse(v, xs)

If you want to tokenize and keep the delimiters (which may be more than one space, for instance) then you're going to have to look beyond String.tokens.

We might write something like the following to iterate over a list of characters and split based on a predicate, but retain the delimiters.

fun splitOn'(p, [], acc) = List.map List.rev (List.rev acc)
  | splitOn'(p, ch::chs, []) = splitOn'(p, chs, [[ch]])
  | splitOn'(p, ch::chs, []::xs) = splitOn'(p, chs, [ch]::xs)
  | splitOn'(p, ch::chs, acc as (first as x::_)::xs) =
      if p ch = p x then
        splitOn'(p, chs, (ch::first)::xs)
      else
        splitOn'(p, chs, [ch]::acc);

fun splitOn(p, str) =
  let
    val lst = String.explode str
    val result = splitOn'(p, lst, [])
  in
    List.map String.implode result
  end;

Of course, we might also use a left fold to reproduce this behavior.

fun splitOn(p, str) =
  let
    val chars = String.explode str 
    val grouped = List.foldl 
      (fn (ch, []) => [[ch]] 
        | (ch, []::xs) => [ch]::xs 
        | (ch, acc as (first as x::_)::xs) => 
            if p ch <> p x then [ch]::acc
            else (ch::first)::xs)
      []
      chars
  in
    List.map String.implode (List.map List.rev (List.rev grouped))
  end;

Splitting on strings

If you're seeking to split on multi-character strings, then the first step would be to find the index of the next such delimiter substring. This is a straightforward process. We increment forward, checking substrings of the same length as the delimiter. If we find it, we return the index. If we can't, we return NONE.

fun findSubstring(str, substr) =
  let 
    val strLen = String.size str
    val substrLen = String.size substr 
  
    fun aux(idx) =
      if idx >= strLen - substrLen then
        NONE
      else if String.substring(str, idx, substrLen) = substr then
        SOME idx
      else
        aux(idx + 1)
  in
    aux(0)
  end;

We then just search for tokens until the delimiter isn't found anymore with a straightforward recursive function.

fun splitOn(str, delim) =
  case findSubstring(str, delim) of
    NONE => [str]
  | SOME idx => 
      let
        val delimLen = String.size delim
        val first = String.substring(str, 0, idx)
        val tail = String.extract(str, idx + delimLen, NONE)
      in
        first :: splitOn(tail, delim)
      end;

Evaluating:

splitOn("Hello, world, foo", ", ");

Yields:

["Hello", "world", "foo"]

Upvotes: 0

Related Questions