How can I split a string by a delimiter in Common Lisp, like is done in SPLIT-SEQUENCE, but also add the delimiter in the list of strings?
For example, I could write:
(split-string-with-delimiter #\. "")
and the result would be ("a" "." "bc" "." "def" "." "com")
I've tried the following code (make-adjustable-string
makes a string that can be extended with vector-push-extend
(defun make-adjustable-string (s)
(make-array (length s)
:fill-pointer (length s)
:adjustable t
:initial-contents s
:element-type (array-element-type s)))
(defun split-str (string &key (delimiter #\ ) (keep-delimiters nil))
"Splits a string into a list of strings, with the delimiter still
in the resulting list."
(let ((words nil)
(current-word (make-adjustable-string "")))
(do* ((i 0 (+ i 1))
(x (char string i) (char string i)))
((= (+ i 1) (length string)) nil)
(if (eql delimiter x)
(unless (string= "" current-word)
(push current-word words)
(push (string delimiter) words)
(setf current-word (make-adjustable-string "")))
(vector-push-extend x current-word)))
(nreverse words)))
But this doesn't print out the last substring/word. I'm not sure what's going on.
Thanks for the help ahead of time!
Something like this?
(defun split-string-with-delimiter (string
&key (delimiter #\ )
(keep-delimiters nil)
&aux (l (length string)))
(loop for start = 0 then (1+ pos)
for pos = (position delimiter string :start start)
; no more delimiter found
when (and (null pos) (not (= start l)))
collect (subseq string start)
; while delimiter found
while pos
; some content found
when (> pos start) collect (subseq string start pos)
; optionally keep delimiter
when keep-delimiters collect (string delimiter)))
CL-USER 120 > (split-string-with-delimiter ".."
:delimiter #\. :keep-delimiters nil)
("1" "2" "3" "4")
CL-USER 121 > (split-string-with-delimiter ".."
:delimiter #\. :keep-delimiters t)
("." "." "1" "." "2" "." "3" "." "4" "." ".")
CL-USER 122 > (split-string-with-delimiter ""
:delimiter #\. :keep-delimiters nil)
("1" "2" "3" "4")
CL-USER 123 > (split-string-with-delimiter ""
:delimiter #\. :keep-delimiters t)
("1" "." "2" "." "3" "." "4")
Or modified to work with any sequence (lists, vectors, strings, ...):
(defun split-sequence-with-delimiter (sequence delimiter
&key (keep-delimiters nil)
&aux (end (length sequence)))
(loop for start = 0 then (1+ pos)
for pos = (position delimiter sequence :start start)
; no more delimiter found
when (and (null pos) (not (= start end)))
collect (subseq sequence start)
; while delimiter found
while pos
; some content found
when (> pos start) collect (subseq sequence start pos)
; optionally keep delimiter
when keep-delimiters collect (subseq sequence pos (1+ pos))))
For the case that you want to split with many delimiters, and keep them:
(defun split-string-with-delims (str delims)
(labels ((delim-p (c)
(position c delims))
(tokens (stri test)
(when (> (length stri) 0)
(let ((p (position-if test stri)))
(if p
(if (= p 0)
(cons (subseq stri 0 (1+ p))
(tokens (subseq stri (1+ p) nil) test))
(cons (subseq stri 0 p)
(tokens (subseq stri p nil) test)))
(cons stri nil))))))
(tokens str #'delim-p)))
And you can call it either:
(split-string-with-delims ".,hello world,," '(#\. #\, #\ ))
; => ("." "," "hello" " " "world" "," ",")
(split-string-with-delims ".,hello world,,!!" "., ")
; => ("." "," "hello" " " "world" "," "," "!!")
Concerning your code, since there is subseq
, i'd go for Rainer Joswig's way(above), instead of your make-adjustable-string
+ vector-push-extend
If you're just looking for a solution, and not for an exercise, you can use cl-ppcre
CL-USER> (cl-ppcre:split "(\\.)" "" :with-registers-p t)
("a" "." "bc" "." "def" "." "com")
The problem is after the end condition of the do* loop. When variable i reaches the end of the string, the do* loop is exited but there is still a current-word which has not been added yet to words. When the end condition is met you need to add x to current-word and then current-word to words, before exiting the loop:
(defun split-string-with-delimiter (string delimiter)
"Splits a string into a list of strings, with the delimiter still
in the resulting list."
(let ((words nil)
(current-word (make-adjustable-string "")))
(do* ((i 0 (+ i 1))
(x (char string i) (char string i)))
((>= (+ i 1) (length string)) (progn (vector-push-extend x current-word) (push current-word words)))
(if (eql delimiter x)
(unless (string= "" current-word)
(push current-word words)
(push (string delimiter) words)
(setf current-word (make-adjustable-string "")))
(vector-push-extend x current-word)))
(nreverse words)))
However, note that this version is still buggy in that if the last character of string is a delimiter, this will be included into the last word, i.e. (split-string-with-delimiter "a.bc.def." #\.) => ("a" "." "bc" "." "def.")
I'll let you add this check.
In any case, you might want to make this more efficient by looking ahead for delimiter and extracting all the characters between the current i and the next delimiter at once as one single substring.
