Oliver Amundsen
Oliver Amundsen

Reputation: 1511

for-in loop's upper limit changing in each loop

How can I update the upper limit of a loop in each iteration? In the following code, List is shortened in each loop. However, the lenList in the for, in loop is not, even though I defined lenList as global. Any ideas how to solve this? (I'm using Python 2.sthg) Thanks!

def similarity(List):
import difflib
lenList = len(List)
for i in range(1,lenList):
    import numpy as np
    global lenList
    a = List[i]
    idx = [difflib.SequenceMatcher(None, a, x).ratio() for x in List]
    z = idx > .9
    del List[z]
    lenList = len(List)


X = ['jim','jimmy','luke','john','jake','matt','steve','tj','pat','chad','don']
similarity(X)

Upvotes: 0

Views: 2929

Answers (5)

lvc
lvc

Reputation: 35089

This problem will become quite a lot easier with one small modification to how your function works: instead of removing similar items from the existing list, create and return a new one with those items omitted.

For the specific case of just removing similarities to the first item, this simplifies down quite a bit, and removes the need to involve Numpy's fancy indexing (which you weren't actually using anyway, because of a missing call to np.array):

import difflib

def similarity(lst): 
    a = lst[0]
    return [a] + \
       [x for x in lst[1:] if difflib.SequenceMatcher(None, a, x).ratio() > .9]

From this basis, repeating it for every item in the list can be done recursively - you need to pass the list comprehension at the end back into similarity, and deal with receiving an empty list:

def similarity(lst):
   if not lst: 
       return []
   a = lst[0]
   return [a] + similarity(
        [x for x in lst[1:] if difflib.SequenceMatcher(None, a, x).ratio() > .9])

Also note that importing inside a function, and naming a variable list (shadowing the built-in list) are both practices worth avoiding, since they can make your code harder to follow.

Upvotes: 0

g.d.d.c
g.d.d.c

Reputation: 48028

Looping over indices is bad practice in python. You may be able to accomplish what you want like this though (edited for comments):

def similarity(alist):
  position = 0
  while position < len(alist):
    item = alist[position]
    position += 1
    # code here that modifies alist

A list will evaluate True if it has any entries, or False when it is empty. In this way you can consume a list that may grow during the manipulation of its items.

Additionally, if you absolutely have to have indices, you can get those as well:

for idx, item in enumerate(alist):
  # code here, where items are actual list entries, and 
  # idx is the 0-based index of the item in the list.

In ... 3.x (I believe) you can even pass an optional parameter to enumerate to control the starting value of idx.

Upvotes: 2

spicavigo
spicavigo

Reputation: 4224

What you are effectively looping on in the above code is a list which got generated in the first iteration itself.

You could have as well written the above as

li = range(1,lenList)
for i in li:
    ... your code ...

Changing lenList after li has been created has no effect on li

Upvotes: 0

osandov
osandov

Reputation: 127

The range is an object which is constructed before the first iteration of your loop, so you are iterating over the values in that object. You would instead need to use a while loop, although as Lattyware and g.d.d.c point out, it would not be very Pythonic.

Upvotes: 0

Gareth Latty
Gareth Latty

Reputation: 89067

The issue here is that range() is only evaluated once at the start of the loop and produces a range generator (or list in 2.x) at that time. You can't then change the range. Not to mention that numbers and immutable, so you are assigning a new value to lenList, but that wouldn't affect any uses of it.

The best solution is to change the way your algorithm works not to rely on this behaviour.

Upvotes: 1

Related Questions