Reputation: 117

Python: Speeding up a slow for-loop calculation (np.append)

I have an input file called "cmp_twtt_amp_rho" which is 7795074 lines long. I would like to calculate sound speed, c, for each line where:

c(i) = rho(i-1) * c(i-1) * (-1-amp(i)) / rho(i) * (amp(i)-1)

using an initial guess of c=1450.

I have written a for loop that I believe will work, however it gets increasing slower with time, so much so that it is not conceivable to run in it's current format.

Could someone help me speed up this piece of code please?

data=np.genfromtxt('./cmp_twtt_amp_rho')
cmp_no=data[:,[0]]
twtt=data[:,[1]]
amp=data[:,[2]]
rho=data[:,[3]]

cs=[]

for i in range(1,len(amp-1)):
  if i == 1:
    print "Using an initial guess of 1450 m/s"
    c = (rho[i-1]*1450*(-1-amp[i]))/(rho[i]*(1-amp[i]))
    cs = np.append(c,cs)
  elif twtt[i] == 0:
    print "Reached new cmp #: ",cmp_no[i],"as twwt has re-started at ",twtt[i]
    c = 1450
    cs = np.append(c,cs)
  else:
    print i
    c = (rho[i-1]*cs[i-1]*(-1-amp[i]))/(rho[i]*(1-amp[i]))
    cs = np.append(c,cs)

print min(cs), max(cs)
print len(cs)

Upvotes: 0

Answers (3)

user2357112

Reputation: 280688

Appending to arrays is really slow, since you have to allocate a whole new array every time. Doing it in a loop is pretty much always going to kill performance.

Instead of appending in a loop, or even using a Python-level loop at all, you can get this done much faster with vectorized operations and a cumulative product:

multipliers = rho[:-1] * (-1 - amp[1:]) / (rho[1:] * (1 - amp[1:])
cs = np.cumprod(np.insert(multipliers, 0, 1450))

(insert also requires allocating an entire new array, but it's okay, since we're only doing it once.)

Also, you probably have a sign error. Your formula says (amp(i) - 1) and your code says (1 - amp[i]). I've chosen to match your code, but you might need to correct that.

Upvotes: 2

pseudoDust

Reputation: 1386

np.append has to reallocate the whole array, witch is bad, but not the only problem. You are appending cs to c and not the other way around, this means that cs will be reversed and cs[i-1] is actually the first c.

It is better in general to pre-allocate your arrays:

cs = np.zeros(len(amp-1))

Then just set values directly:

cs[i] = c

Something like this should be a bit quicker:

cs=np.zeros(len(amp-1))

print "Using an initial guess of 1450 m/s"
cs[1] = (rho[i-1]*1450*(-1-amp[i]))/(rho[i]*(1-amp[i]))

for i in range(2,len(amp-1)):
  if twtt[i] == 0:
    print "Reached new cmp #: ",cmp_no[i],"as twwt has re-started at ",twtt[i]
    cs[i] = 1450
  else:
    print i
    cs[i] = (rho[i-1]*cs[i-1]*(-1-amp[i]))/(rho[i]*(1-amp[i]))

Upvotes: 0

mgilson

Reputation: 309929

Numpy arrays aren't really meant to be appended (numpy needs to allocate a full new array each time and copy the old data over). You probably don't want to be doing this in a loop.

It's better to use a data-structure that is meant for this sort of thing -- Generally python's list handles appending pretty well so I'd suggest you store the data in a list and append to it as you go. Then at the end, if you need the full dataset as an array, you can convert back at that point.

I'd recommend just changing to cs.append(c) instead of cs = np.append(c, cs)

Upvotes: 1

Python: Speeding up a slow for-loop calculation (np.append)

Answers (3)

Related Questions