Reputation: 1
Here is a basic implementation of the tokens function:
fun f c = c = #" ";
val testStr = "int main(){return 42;}";
val stringL = String.tokens f testStr;
It returns:
val stringL = ["int","main(){return","42;}"] : string list
How do I get it to return:
val stringL = ["int"," ","main(){return"," ","42;}"] : string list
I tried to pipe forward the value so that it would return the " " as well. But I only got errors.
Upvotes: 0
Views: 60
Reputation: 36620
Well, knowing that spaces occur between each element in the list, you can add them back later.
fun intersperse(_, []) = []
| intersperse(_, lst as [_]) = lst
| intersperse(v, x::xs) =
x :: v :: intersperse(v, xs)
If you want to tokenize and keep the delimiters (which may be more than one space, for instance) then you're going to have to look beyond String.tokens
.
We might write something like the following to iterate over a list of characters and split based on a predicate, but retain the delimiters.
fun splitOn'(p, [], acc) = List.map List.rev (List.rev acc)
| splitOn'(p, ch::chs, []) = splitOn'(p, chs, [[ch]])
| splitOn'(p, ch::chs, []::xs) = splitOn'(p, chs, [ch]::xs)
| splitOn'(p, ch::chs, acc as (first as x::_)::xs) =
if p ch = p x then
splitOn'(p, chs, (ch::first)::xs)
else
splitOn'(p, chs, [ch]::acc);
fun splitOn(p, str) =
let
val lst = String.explode str
val result = splitOn'(p, lst, [])
in
List.map String.implode result
end;
Of course, we might also use a left fold to reproduce this behavior.
fun splitOn(p, str) =
let
val chars = String.explode str
val grouped = List.foldl
(fn (ch, []) => [[ch]]
| (ch, []::xs) => [ch]::xs
| (ch, acc as (first as x::_)::xs) =>
if p ch <> p x then [ch]::acc
else (ch::first)::xs)
[]
chars
in
List.map String.implode (List.map List.rev (List.rev grouped))
end;
If you're seeking to split on multi-character strings, then the first step would be to find the index of the next such delimiter substring. This is a straightforward process. We increment forward, checking substrings of the same length as the delimiter. If we find it, we return the index. If we can't, we return NONE
.
fun findSubstring(str, substr) =
let
val strLen = String.size str
val substrLen = String.size substr
fun aux(idx) =
if idx >= strLen - substrLen then
NONE
else if String.substring(str, idx, substrLen) = substr then
SOME idx
else
aux(idx + 1)
in
aux(0)
end;
We then just search for tokens until the delimiter isn't found anymore with a straightforward recursive function.
fun splitOn(str, delim) =
case findSubstring(str, delim) of
NONE => [str]
| SOME idx =>
let
val delimLen = String.size delim
val first = String.substring(str, 0, idx)
val tail = String.extract(str, idx + delimLen, NONE)
in
first :: splitOn(tail, delim)
end;
Evaluating:
splitOn("Hello, world, foo", ", ");
Yields:
["Hello", "world", "foo"]
Upvotes: 0