Reputation: 855
Say you have a list made up of lists. For example, a list A: (list '(1 2 3) '(1 4 3) )
. Further, you are given a list B: '(0 2 3)
. The task is: determine which sublist of A matches B the most. Note here that matching means the same integers in the same positions of the list. So for this case, the answer is the sublist '(1 2 3 )
. How can you automate this using the lisp loop macro? Below is my attempt.
(defun select-most-specific-list (listA listB)
(loop with candidate_sublist = '()
for sublist in listA
do (loop for number1 in sublist
for number2 in listB
when (= number1 number2)
do (setq candidate_sublist sublist)
finally (return candidate_list))))
I give the below input:
(select-most-specific-list (list '(1 2 3) '(1 4 3) ) '(0 2 3))
I get NIL
.
In addition, I am almost certain that my logic is wrong. With the above input, I expected it to give '(1 4 3)
instead of the right answer '(1 2 3)
. This is because a closer look at my logic will show that I do not store the results of all comparisons. So the last successful comparison erroneously dictates the most specific sublist. How can I achieve this?
Upvotes: 2
Views: 466
Reputation: 9865
I broke it down to:
(defun similarity (list1 list2)
(loop for number1 in list1
for number2 in list2 ;thanks @Rainer Joswig!
count (= number1 number2))) ;and @jkiiski!
(defun most-similar-list (lists qry-list &key (dist-func #'similarity))
(let* ((simils (loop for l in lists ;thanks @coredump!
collect (funcall dist-func l qry-list)))
(max-simil (reduce #'max simils)) ;thanks @Rainer Joswig!
(idx-max-simil (position max-simil simils :test #'=)))
(elt lists idx-max-simil)))
Your example
(most-similar-list (list '(1 2 3) '(1 4 3) ) '(0 2 3))
;; (1 2 3)
Appendix
How not to define distance
:
;; (defun distance (list1 list2)
;; (apply '+ (loop for number1 in list1 ;reduce is better! by @Rainer Joswig
;; for number2 in list2 ;because of the limit of arg numbers
;; collect (if (= number1 number2) 1 0)))) ;for apply (max ~50
;; - the exact number is implementation-dependent - see comments by him)
More general (collect all minimal distance / most similar elements - as suggested by @coredump)
Using the function from Position of All Matching Elements in List which I modified for variable tests:
(defun all-positions (qry-el l &key (test #'=))
(loop for el in l
and pos from 0
when (funcall test el qry-el)
collect pos))
Then the solution:
(defun select-most-similar (lists qry-list &key (dist-func #'distance))
(let* ((dists (loop for l in lists ;thanks @coredump!
collect (funcall dist-func l qry-list)))
(max-dist (reduce #'max dists)) ;thanks @Rainer Joswig!
(max-dist-idxs (all-positions max-dist dists :test #'=)))
(loop for i in max-dist-idxs
collect (nth i lists))))
Or taking @coredump's function and generalizing (I used not min but max):
(defun similarity (l1 l2)
(loop for e1 in l1
for e2 in l2
count (= e1 e2)))
(defun most-specific-lists (lists one-list &key (dist-func #'similarity))
(loop for l in lists
for dist = (funcall dist-func l one-list)
for max-dist = dist then (max dist max-dist)
for max-dist-l = (list l) then
(cond ((= dist max-dist) (cons l max-dist-l))
((> dist max-dist) (list l))
(t max-dist-l))
finally (return (values max-dist-l max-dist))))
Upvotes: 2
Reputation: 38809
(loop for number1 in sublist
for number2 in listB
when (= number1 number2)
do (setq candidate_sublist sublist)
finally (return candidate_list))
As soon as two number matches in your list, you replace candidate_sublist
, even if it is worse than the previous binding. Suppose candidate_sublist
is (0 2 3)
, which is equal to the input list (you can't be more similar than that). Then, you iterate over the next candidate which is (0 9 9)
. With your code, since (= 0 0)
, you change candidate_sublist
. You really have to inspect all values in both lists being compared before making a decision.
You are trying to define a comparison function among lists: a list is better than another one if it is more similar to the given list. This can be implemented by relying on a distance function. The following one is good enough:
(defun distance (a b)
(count nil (mapcar #'= a b)))
Or, with a loop:
(defun distance (a b)
(loop
for aa in a
for bb in b
count (/= aa bb)))
Thus, the more difference there is among two lists, the higher the distance is.
This comparison, however, defines a partial order, because you can easily have two lists that are equally close to the input. For exampe, given the following list:
(0 1 2)
Both (1 1 2)
and (0 1 1)
have the same number of matching values.
You cannot just return a single best answer, or else you are going to choose one based on an arbitrary criterion (e.g. an implementation detail, like the traversal order of your lists). What I would do is compute all lists that are at an equal distance to the input list.
(defun closest-lists (list candidates)
(loop
for candidate in candidates
for distance = (distance list candidate)
for better = T then (< distance min-distance)
for min-distance = (if better distance min-distance)
for best-matches = (cond
(better (list candidate))
((= distance min-distance) (cons candidate best-matches))
(t best-matches))
finally (return (values best-matches min-distance))))
As said in comments by @Gwang-Jin Kim, the closest-lists
function can even be used with other distance functions if we add it as a parameter. Following the naming convention from sort
, we could define a predicate
argument to specify the comparison function, and a key
argument to specify how to retrieve the values to be compared (the score). Then, our function is actually no more related to lists, and can be renamed to be more general:
(defun filter-by-score (candidates predicate &key (key #'identity))
"Keep elements from CANDIDATES having the same best rank according to PREDICATE.
PREDICATE should return non-NIL if its first argument precedes its
second one. Elements are compared according the value returned by
applying KEY. The KEY function is guaranteed to be applied once only
for each element in CANDIDATES."
(loop
for candidate in candidates
for score = (funcall key candidate)
for better = T then (funcall predicate score best-score)
for best-score = (if better score best-score)
for best-items = (cond
(better (list candidate))
((funcall predicate best-score score) best-items)
(t (cons candidate best-items)))
finally (return (values best-items best-score))))
Then, our previous function could be expressed as:
(filter-by-score candidates #'< :key (lambda (u) (distance list u)))
But we can also do:
CL-USER> (filter-by-score '("a" "ab" "cd" "ed" "fe" "aaa" "bbb" "nnn")
#'> :key #'length)
("nnn" "bbb" "aaa")
3
Or even:
CL-USER> (import 'alexandria:curry)
CL-USER> (ql:quickload :levenshtein)
CL-USER> (filter-by-score '("boat" "baobab" "brain" "biscuit")
#'<
:key (curry #'levenshtein:distance "ball"))
("brain" "boat")
3
Upvotes: 4