Munchkin
Munchkin

Reputation: 4946

python interpolation takes a long time

I do an interpolation with scipy.interpolate.griddata in a 1000 x 1000 grid. When I have a point-cloud with 1,000 (x,y,z)-values, calculation only takes a few seconds. But now I have 1,000,000 values. So I created a loop to extract 1,000 values out of these 1,000,000 values, like this:

p = [...]
z = [...]
#p and z are my lists with 1,000,000 values
p_new = []
z_new = []
for i in range(1000000):
    if condition:
        #condition is True for about 1000 times
        p_new.append(p[i])
        z_new.append(z[i])
print 'loop finished'

points = np.array(p_new)
values = np.array(z_new)
grid_z1 = griddata(points, values, (grid_x, grid_y), method='cubic')
plt.imshow(grid_z1.T, origin='lower')
plt.show()

print len(p_new) returns me 1000, so my loop works as expected. But after my loop finished, I canceled my program after 15minutes of waiting because nothing happened.

So finally my question is: Why takes this calculation so long although in both cases (1000 values by default and 1000 values extracting them out of 1000000) I have the same number of values? In my output loop finished I can see that loop only takes about 10sec, so it should have nothing to do with my loop =/

Upvotes: 3

Views: 2155

Answers (1)

ali_m
ali_m

Reputation: 74154

I can't see anything unusual going on here - as far as I can tell the time taken to interpolate is roughly proportional to the number of elements in the point cloud.

Here's some test data:

def fake_data(n):

    # xy coordinates for an n-by-n grid
    grid = np.indices((n,n),dtype=np.float32).reshape(2,-1).T

    # interpolated coordinates
    xy_i = grid.copy()
    # not monotonically increasing
    np.random.shuffle(xy_i)

    # values
    z = np.random.rand(n**2)

    # input coordinates
    xy = grid.copy()
    # not regularly gridded 
    xy += np.random.rand(*xy_i.shape)*0.25

    # pick n random points to use
    inc = np.random.choice(np.arange(n**2),(n,),replace=False)
    xy = grid[inc,:]
    z = z[inc]

    return xy, z, xy_i

enter image description here

For all three methods a log-log plot of N vs time is roughly a straight line, with a slope of ~2, i.e. they all require O(N^2) time.

If, in your case, you see that the lines are not straight but deviate upwards for large values of N, that might indicate that you're having some other problem, such as running out of memory and hitting the swap.

Upvotes: 1

Related Questions