Reputation: 461
I have run into an oddity with creating nested lists in Python 2.6.6.
Consider the following two functions:
def lists(n):
start_time = time.time()
lists = [None]*n
for i in xrange(n):
lists[i] = [None]*n
for j in xrange(n):
lists[i][j] = []
print time.time() - start_time
def simple_lists(n):
start_time = time.time()
lists = [None]*n
for i in xrange(n):
lists[i] = [None]*n
for j in xrange(n):
lists[i][j] = False
print time.time() - start_time
They both allocate an array of size n*n. One assigns all values as "False", and one assigns all values as empty lists. They should both run in O(n^2) as far as I can see. However, this does not seem the case... Observe the following test runs:
>>> for i in [4000, 8000, 16000]: simple_lists(i)
2.11170578003
8.67467808723
34.0958559513
>>> for i in [1000, 2000, 4000]: lists(i)
1.13742399216
7.39806008339
78.0808939934
As you can see, simple_lists() seems to grow roughly O(n^2), while lists() seems to grow over O(n^3)!
What's going on here? This quirk is completely and utterly ruining the complexity of my code, and I can't explain why it's behaving like this. Does anyone have any ideas?
Edit: List comprehensions seem to cause the same complexity issues.
Redefining lists() like
def lists(n):
start_time = time.time()
lists = [[[] for y in xrange(n)] for x in xrange(n)]
print time.time() - start_time
causes the following results
>>> for i in [1000, 2000, 4000]: lists(i)
0.388785839081
4.45830011368
65.6449248791
... which is still clearly growing at a pace quicker than O(n^2) (and even quicker than O(n^3) - sheesh).
edit2: After some further looking into the problem, it seems it's caused by the garbage collector. After running gc.disable()
, this is the result with the original lists() definition:
>>> for i in [1000, 2000, 4000]: lists(i)
...
0.155457019806
0.616811990738
2.38965821266
which is pretty damn neatly O(n^2).
Moral of the story: don't trust the garbage collector!
Upvotes: 3
Views: 1492
Reputation: 101
Generate the first list first and it will do it in O(n) time.
def simple_list(n):
import time
start_time = time.process_time()
k=[[] for y in range(n)]
lists = [k for x in range(n)]
end_time=time.process_time()
print (end_time - start_time)
Upvotes: 0
Reputation: 461
Turns out the weird behavior was caused by the garbage collector. Turning it off with gc.disable()
yielded the following:
>>> for i in [1000, 2000, 4000]: lists(i)
...
0.155457019806
0.616811990738
2.38965821266
which fits O(n^2) like a glove.
Upvotes: 2
Reputation: 413
On my machine
for i in [1000, 2000, 4000]: lists(i)
gives
0.994000196457
4.31200003624
17.9900000095
which is a nice neat O(n^2). The last one consumes 1GB of memory, and so lists(8000) thrashes the hard drive and causes my machine to misbehave. delnan is probably right and I'd check your machine's free memory and python's memory consumption during the operation.
Upvotes: 2