Reputation: 33
I am trying to write a function which verifies if a string is included in another one in Lisp but I cannot
For example :
(string-include 'abd 'abbbe) => nil
(string-include 'ghf 'dghfd) => ghf
Here is my function:
(defun string-include (string1 string2)
(cond
((not string1) 0)
((not string2) 0)
((.... (string1) (string2)) (string1 (string-include string1 (cdr string2))))
((string-include string1 (cdr string2)) ) )
Upvotes: 2
Views: 4891
Reputation: 18927
Judging by your code, what you are looking for is something like this:
(defun string-include (string1 string2)
(cond
((zerop (length string1)) nil) ; string1 is empty (no need to test it every time)
((> (length string1) (length string2)) nil) ; string1 is longer than string2
((string= string1 (subseq string2 0 (length string1))) string1) ; string2 starts with string1
(t (string-include string1 (subseq string2 1))))) ; otherwise shorten string2 by 1 and start over
This works but it is inefficient and not idiomatic Common Lisp. Just make sure that you actually pass strings and not symbols like in your example:
? (string-include "abd" "abbbe")
NIL
? (string-include "ghf" "dghfd")
"ghf"
Of course, Joshua's answer is the recommended solution.
EDIT
Added a version that works with both symbols and strings (but returns strings anyway). I took the opportunity to include one of Joshua's suggestions:
(defun string-include (string1 string2)
(let* ((string1 (string string1)) (length1 (length string1)))
(if (zerop length1)
nil
(labels ((sub (s)
(cond
((> length1 (length s)) nil)
((string= string1 s :end2 (length string1)) string1)
(t (sub (subseq s 1))))))
(sub (string string2))))))
Testing:
? (string-include "abd" "abbbe")
NIL
? (string-include "ghf" "dghfd")
"ghf"
? (string-include 'abd 'abbbe)
NIL
? (string-include 'ghf 'dghfd)
"GHF"
? (string-include "ghf" '|dghfd|)
"ghf"
? (string-include '|ghf| "dghfd")
"ghf"
Upvotes: 4
Reputation: 85883
In your question, you used this example:
(string-include 'abd 'abbbe) => nil (string-include 'ghf 'dghfd) => ghf
Assuming that you're returning the symbols nil and ghf, you'll run into an ambiguity if you ever want to check whether a string contains the substring NIL. E.g., with this approach, you'll have:
(string-include 'nil 'vanilla) => nil
Did that return nil because "NIL" is in "VANILLA", because it isn't? It's ambiguous and you can't tell. Instead, you could return the actual string, since the string "NIL" is a true value. Even better, if you return the index of the string, then you find out where in the other string the first string appears. That's the way that the built in function search behaves, for instance.
You can implement this in terms of search:
(defun substringp (needle haystack &key (test 'char=))
"Returns the index of the first occurrence of the string designated
by NEEDLE within the string designated by HAYSTACK, or NIL if it does
not occur. Characters within the string are compared by TEST, which
defaults to CHAR= (for case-sensitive comparison)."
(search (string needle)
(string haystack)
:test test))
Note the use of the string function to convert from string designators (characters, strings, and symbols) to the strings that they designate. Remember that with the standard settings, the reader upcases the names of symbols, so the symbol cat designates the string "CAT". Finally, since this returns the result from search, it does double duty for you: it returns the index of the first occurrence if there is an occurrence, and nil otherwise. Remember that everything except nil is a true value (even 0), so you can use the result as a boolean or as an index (as long as you check that it's not nil). Here are some examples:
CL-USER> (substringp "cat" "concatenate")
3
CL-USER> (substringp "dog" "concatenate")
NIL
;; Default upcasing of symbol names means that the
;; result of 'cat is a symbol named "CAT", which is not
;; in "concatenate".
CL-USER> (substringp 'cat "concatenate")
NIL
;; You can test the characters with CHAR-EQUAL, which
;; is case insensitive, in which case "CAT" is in
;; "concatenate".
CL-USER> (substringp 'cat "concatenate" :test 'char-equal)
3
Your code, and the code that uselpa showed in another answer, are more recursive in nature. That in and of itself is not a problem, but recursive string processing in Common Lisp is prone to a few pitfalls. It's inefficient to make lots of new stings by using subseq, so lots of sequence functions in Common Lisp take :start and :end arguments, or in the case of functions that take two sequences, :start1, :end1, :start2, and :end2 arguments. By using these, you can recurse and change the indices into the strings, rather than creating entirely new strings. For instance, string= lets you compare two strings.
;; "toc" is in both "octocat" and "toccata"
CL-USER> (string= "octocat" "toccata" :start1 2 :end1 5 :end2 3)
T
Working with these kinds of functions requires a bit of care to make sure you don't provide any indices that are out of range, but it's not too bad, and you don't end up copying any strings. Here's a version of substringp that accepts these start and end parameters, and uses a local recursive function to do the actual processing.
(defun substringp (string1 string2
&key
(start1 0) (end1 nil)
(start2 0) (end2 nil))
"Returns the index of the first occurence of the substring of
STRING1 bounded by START1 and END1 within the substring of STRING2
bounded by START2 and END2, or NIL if the string does not appear. The
index is a position within STRING2 as a whole."
;; First, compute the actual strings designated by STRING1 and
;; STRING2, and the values for END1 and END2, which default to the
;; length of the respective strings. Also get the length of the
;; substring in STRING1 that we're looking for. This is done just
;; once. The actual recursive portion is handled by the local
;; function %SUBSTRINGP.
(let* ((string1 (string string1))
(string2 (string string2))
(end1 (or end1 (length string1)))
(end2 (or end2 (length string2)))
(len1 (- end1 start1)))
(labels ((%substringp (start2 &aux (end2-curr (+ start2 len1)))
(cond
;; If end2-curr is past end2, then we're done, and
;; the string was not found.
((not (< end2-curr end2)) nil)
;; Otherwise, check whether the substrings match. If
;; they do, return the current start2, which is the
;; index of the substring within string2.
((string= string1 string2
:start1 start1 :end1 end1
:start2 start2 :end2 end2-curr)
start2)
;; If that doesn't match, then recurse, starting one
;; character farther into string2.
(t (%substringp (1+ start2))))))
(%substringp start2))))
Upvotes: 7