Reputation: 1551
The following code is a measured hot spot, distilled from some code I'm writing. I'm trying to figure out how to speed up this loop in Python 3.9.0. I measure the same loop to be >30x faster using std::vector
in VC++ 2019.
As you can see, I've tried a few different methods. The map()
function appears to return an iterator, so I converted it to a list to measure the full cost of execution.
I feel this is a fairly natural way to represent my data. I could certainly work on some representational or algorithmic improvements here. However, I'm sort of surprised that iteration is so slow in this case, and I'd like to see if it can be improved, first.
Performance results from executing: python listIteration.py
Iteration by index
66.66 ms
60.90 ms
62.74 ms
Total: 124998250000
Iteration by index -- just integers
55.22 ms
55.27 ms
80.84 ms
Total: 124998250000
Iteration by object
56.48 ms
60.30 ms
55.77 ms
Total: 124998250000
List comprehension
235.34 ms
328.15 ms
272.47 ms
Total: 124998250000
Map
310.81 ms
353.87 ms
300.27 ms
Total: 124998250000
Code:
import time
def makeList():
data = []
for i in range(500000):
data.append([i, i, i])
return data
def makeListOfInts():
data = []
for i in range(500000):
data.append(i)
return data
def dumpTime(delta):
print("{:.2f}".format(1000.0*delta) + " ms")
NUM_TRIALS = 3
print("Iteration by index");
data = makeList()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
for j in range(len(data)):
data[j][0] -= 1
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum([x[0] for x in data])
print("Total: "+ str(total))
print("Iteration by index -- just integers");
data = makeListOfInts()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
for j in range(len(data)):
data[j] -= 1
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum(data)
print("Total: "+ str(total))
print("Iteration by object");
data = makeList()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
for v in data:
v[0] -= 1
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum([x[0] for x in data])
print("Total: "+ str(total))
print("List comprehension");
data = makeList()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
data = [[x[0]-1, x[1], x[2]] for x in data]
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum([x[0] for x in data])
print("Total: "+ str(total))
print("Map");
data = makeList()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
# here we convert the map object to a list, because apparently
# map() returns an iterator, and we want to measure the full cost
# of the computation
data = list(map(lambda x: [x[0]-1, x[1], x[2]], data))
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum([x[0] for x in data])
print("Total: "+ str(total))
Upvotes: 2
Views: 894
Reputation: 25489
Python code is going to be slower than C++. No way around it, unless you eliminate / outsource the iteration to a C-backend, which is what numpy
does.
For example, you could do
import numpy as np
def makeArray():
data = np.vstack((np.arange(500000), np.arange(500000), np.arange(500000))).T
return data
def makeArrayOfInts():
data = np.arange(500000)
return data
And then, you wouldn't need to iterate at all.
data = makeArray()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
data[:, 0] = data[:, 0] - 1
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum(data[:, 0])
print("Total: "+ str(total))
data = makeArrayOfInts()
for t in range(NUM_TRIALS):
x1 = time.perf_counter()
data = data - 1
x2 = time.perf_counter()
dumpTime(x2-x1)
total = sum(data)
print("Total: "+ str(total))
Both these are superfast: the trials each take ~1ms, as opposed to ~50ms that is needed for iterating over the lists.
Upvotes: 2