Reputation: 7599
Quite often I need to replace subsequence of certain elements with another sequence of the same type, but, probably with different length. Implementation of such function is no challenge, this is what I use now:
(defun substitute* (new old where &key key (test #'eql))
(funcall (alambda (rest)
(aif (search old rest :key key :test test)
(concatenate (etypecase rest
(string 'string)
(vector 'vector)
(list 'list))
(subseq rest 0 it)
new
(self (subseq rest (+ it (length old)))))
rest))
where))
Works like this:
CL-USER> (substitute* '(x y) '(z) '(1 z 5 8 y z))
(1 X Y 5 8 Y X Y)
CL-USER> (substitute* "green button" "red button"
"here are red indicator, red button and red wire")
"here are red indicator, green button and red wire"
CL-USER> (substitute* #(4) #(2 2) #(2 2 2 2 2))
#(4 4 2)
You see, it's very handy and useful, so I've feeling that I'm reinventing wheel and it must be in the standard library, I just don't know its name (sometimes names are not obvious, you may search for filter
while what you need is set-difference
).
As a result of compromise between clarity and efficiency:
(defun substitute* (new old where &key key (test #'eql))
(let ((type (etypecase where
(string 'string)
(vector 'vector)
(list 'list)))
(new (coerce new 'list))
(old (coerce old 'list))
(where (coerce where 'list)))
(coerce (funcall (alambda (rest)
(aif (search old rest :key key :test test)
(append (remove-if (constantly t) rest :start it)
new
(self (nthcdr (+ it (length old)) rest)))
rest))
where)
type)))
Upvotes: 4
Views: 495
Reputation: 85863
I don't think that there's any standard function for this. It's more complicated than the standard replace
family of functions. Those can operate destructively because you know in advance that you can replace element by element. Even in that case, it's still somewhat difficult to do this efficiently, because the access time for lists and vectors is very different, so general-purpose functions like subseq
can be problematic. As Rainer Joswig pointed out in a comment:
It's kind of unfortunate that for many algorithms over sequences there is no single efficient implementation. I see often that there are two versions, one for lists and one for vectors, which then get hidden behind a dispatching function. For a hack a simple common version is fine, but for a library function, often there are different implementations - like shown here.
(In fact, in doing a bit of research on whether some library contains a function for this, one of the first Google results I got was a question on Code Review, Generic sequence splitter in Common Lisp, in which Rainer and I both had some comment similar to those here.)
However, your implementation is rather inefficient because it makes multiple copies of the the remainders of sequences. E.g., when you replace (z)
in (1 z 2 z 3 z)
, with (x y)
, you'll first make (3 x y)
, then copy it in making (2 x y 3 z y)
, and then you'll copy that in making (1 x y 2 x y 3 x y)
. You might be better off in doing one pass over the sequence, determining the indices of the subsequences to replace, or collecting the bits that need to don't need to be replaced, etc. You'll probably want separate implementations for lists and for other sequences. E.g., with a list, you might do:
(defun splice-replace-list (old new list)
(do ((new (coerce new 'list))
(old-len (length old))
(parts '()))
((endp list)
(reduce 'append (nreverse parts) :from-end t))
(let ((pos (search old list)))
(push (subseq list 0 pos) parts)
(cond
((null pos)
(setf list nil))
(t
(push new parts)
(setf list (nthcdr (+ old-len pos) list)))))))
There are some optimizations you could make here, if you wanted. For instance, you could implement a search-list
that, rather than returning the position of the first instance of the sought sequence, could return a copy of the head up until that point and the tail beginning with the sequence as multiple values, or even the copied head, and the tail after the sequence, since that's what you're really interested in, in this case. Additionally, you could do something a bit more efficient than (reduce 'append (nreverse parts) :from-end t)
by not reversing parts
, but using a reversed append. E.g.,
(flet ((xappend (l2 l1)
(append l1 l2)))
(reduce #'xappend '((5 6) (x y) (3 4) (x y))))
;=> (x y 3 4 x y 5 6)
I wrote this in a somewhat imperative style, but there's no reason that you can't use a functional style if you want. Be warned that not all Lisp implementation support tail call optimization, so it might be better to use do
, but you certainly don't have to. Here's a more functional version:
(defun splice-replace-list (old new list)
(let ((new-list (coerce new 'list))
(old-len (length old)))
(labels ((keep-going (list parts)
(if (endp list)
(reduce 'append (nreverse parts) :from-end t)
(let* ((pos (search old list))
(parts (list* (subseq list 0 pos) parts)))
(if (null pos)
(keep-going '() parts)
(keep-going (nthcdr (+ old-len pos) list)
(list* new-list parts)))))))
(keep-going list '()))))
For non lists, this is more difficult, because you don't have the specific sequence type that you're supposed to be using for the result. This is why functions like concatenate
require a result-type argument. You can use array-element-type
to get an element type for the input sequence, and then use make-array
to get a sequence big enough to hold the result. That's trickier code, and will be more complicated. E.g., here's a first attempt. It's more complicated, but you'll get a result that's pretty close to the original vector type:
(defun splice-replace-vector (old new vector &aux (new-len (length new)))
(flet ((assemble-result (length parts)
(let ((result (make-array length :element-type (array-element-type vector)))
(start 0))
(dolist (part parts result)
(cond
((consp part)
(destructuring-bind (begin . end) part
(replace result vector :start1 start :start2 begin :end2 end)
(incf start (- end begin))))
(t
(replace result new :start1 start)
(incf start new-len)))))))
(do ((old-len (length old))
(total-len 0)
(start 0)
(indices '()))
((null start) (assemble-result total-len (nreverse indices)))
(let ((pos (search old vector :start2 start)))
(cond
((null pos)
(let ((vlength (length vector)))
(push (cons start vlength) indices)
(incf total-len (- vlength start))
(setf start nil)))
(t
(push (cons start pos) indices)
(push t indices)
(incf total-len (- pos start))
(incf total-len new-len)
(setf start (+ pos old-len))))))))
CL-USER> (splice-replace-vector '(#\z) '(#\x #\y) "12z")
"12xy"
CL-USER> (splice-replace-vector '(z) '(x y) #(x y))
#(X Y)
CL-USER> (splice-replace-vector '(z) '(x y) #(1 z 2 z 3 4 z))
#(1 X Y 2 X Y 3 4 X Y)
CL-USER> (splice-replace-vector '(#\z) #(#\x #\y) "1z2z34z")
"1xy2xy34xy"
If you only want to make one pass through the input vector, then you could use an adjustable array as the output, and append to it. An adjustable array will have a bit more overhead than a fixed size array, but it does make the code a bit simpler.
(defun splice-replace-vector (old new vector)
(do ((vlength (length vector))
(vnew (coerce new 'vector))
(nlength (length new))
(result (make-array 0
:element-type (array-element-type vector)
:adjustable t
:fill-pointer 0))
(start 0))
((eql start vlength) result)
(let ((pos (search old vector :start2 start)))
(cond
;; add the remaining elements in vector to result
((null pos)
(do () ((eql start vlength))
(vector-push-extend (aref vector start) result)
(incf start)))
;; add the elements between start and pos to the result,
;; add a copy of new to result, and increment start
;; accordingly
(t
;; the copying here could be improved with adjust-array,
;; and replace, instead of repeated calls to vector-push-extend
(do () ((eql start pos))
(vector-push-extend (aref vector start) result)
(incf start))
(loop for x across vnew
do (vector-push-extend x result))
(incf start (1- nlength)))))))
Using these two functions, you could define a general splice-replace
that checks the type of the original input sequence and calls the appropriate function:
(defun splice-replace (old new sequence)
(etypecase sequence
(list (splice-replace-list old new sequence))
(vector (splice-replace-vector old new sequence))))
CL-USER> (splice-replace #(z) '(x y) #(1 z 2 z 3 4 z))
#(1 X Y 2 X Y 3 4 X Y)
CL-USER> (splice-replace '(z) #(x y) '(1 z 2 z 3 4 z))
(1 X Y 2 X Y 3 4 X Y)
Upvotes: 5