Performance issue with Python list operations

Question

I'm learning python and had an excercise where I had a list of integers and had to keep one on two numbers, then 2 on 3, then 3 on 4... And see at the end what was left. Ex : [1,2,3,4,5,6,7,8,9] --> [1,3,5,7,9] --> [1,3,7,9] --> [1,3,7]

My first try, just to see the output (beware, shockingly dirty code) :

n=input()
list2=list(range(1,int(n))) 
list1=[]
step=2
while list1 != list2 :
    countSteps=1
    position=0
    list1=list2
    list2=[]
    lenlist1=len(list1)
    while position < lenlist1 :
      while countSteps != step and position < lenlist1 :
        list2.append(list1[position])
        countSteps+=1
        position+=1
      position+=1
      countSteps = 1
    step+=1
print(list2)

The "idea" there was to use two lists and add 1/2/3... on 2/3/4... numbers from one to the other. It seems to me (maybe I'm wrong) that memory wise it was a bad choice, so I came up with the following :

n=input()
list1=list(range(1,int(n)))
lenCached=0
step=1
while lenCached!=len(list1) :
  lenCached = len(list1)
  position = step
  while position < len(list1):
    list1.pop(position)
    position+=step
  step+=1
print(list1)

I'm happy with how this last one looks but performance is horrible. When I run the first one with range(1,1000000), it takes 10s while the second one takes ages (several minutes). Why and is there something I can do about it ?

(if reading this you thought of a better way to achieve the initial goal I would of course be interested in hearing it!)

Thibaut · Accepted Answer

The reason why your second code is much slower is that the pop operation on lists is very slow (its running time is proportional to the length of the list instead of being constant) as explained here, unless you pop an item from the end of the list.

Here is a code which has similar performance to your first code (slightly better on my machine) but looks more idiomatic. In particular, I'm using a list comprehension instead of the series of append that you are using to construct l2 from l1:

n=input()
l = list(range(1, n))
step = 2
while step <= len(l):
    l = [num for ind, num in enumerate(l) if (ind + 1) % step != 0]
    step += 1
print(l)

A numpy implementation which is much faster:

import numpy as np

n = input()
a = np.arange(1, n)
step = 2
while step <= len(a):
    mask = np.ones(len(a), dtype=bool)
    ind = np.arange(step-1,len(a), step)
    mask[ind] = False
    a = a[mask]
    step += 1
print(a)

By the way, this process is known as the Flavius Josephus's sieve.

Performance issue with Python list operations

Answers (1)

Related Questions