Reputation: 2273
I have a function that works something like this:
def Function(x):
a = random.random()
b = random.random()
c = OtherFunctionThatReturnsAThreeColumnArray()
results = np.zeros((1,5))
results[0,0] = a
results[0,1] = b
results[0,2] = c[-1,0]
results[0,3] = c[-1,1]
results[0,4] = c[-1,2]
return results
What I'm trying to do is run this function many, many times, appending the returned one row, 5 column results to a running data set. But the append function, and a for-loop are both ruinously inefficient as I understand it, and I'm both trying to improve my code and the number of runs is going to be large enough that that kind of inefficiency isn't doing me any favors.
Whats the best way to do the following such that it induces the least overhead:
Upvotes: 3
Views: 194
Reputation: 20373
You're correct in thinking that numpy.append
or numpy.concatenate
are going to be expensive if repeated many times (this is to do with numpy declaring a new array for the two previous arrays).
The best suggestion (If you know how much space you're going to need in total) would be to declare that before you run your routine, and then just put the results in place as they become available.
If you're going to run this nrows
times, then
results = np.zeros([nrows, 5])
and then add your results
def function(x, i, results):
<.. snip ..>
results[i,0] = a
results[i,1] = b
results[i,2] = c[-1,0]
results[i,3] = c[-1,1]
results[0,4] = c[-1,2]
Of course, if you don't know how many times you're going to be running function this won't work. In that case, I'd suggest a less elegant approach;
Declare a possibly large results
array and add to results[i, x]
as above (keeping track of i
and the size of results.
When you reach the size of results
, then do the numpy.append
(or concatenate
) on a new array. This is less bad than appending repetitively and shouldn't destroy performance - but you will have to write some wrapper code.
There are other ideas you could pursue. Off the top of my head you could
Write the results to disk, depending on the speed of OtherFunctionThatReturnsAThreeColumnArray
and the size of your data this may not be too daft an idea.
Save your results in a list comprehension (forgetting numpy
until after the run). If function returned (a, b, c) not results;
results = [function(x) for x in my_data]
and now do some shuffling to get results into the form you need.
Upvotes: 2