Reputation: 117
I have an input file called "cmp_twtt_amp_rho" which is 7795074 lines long. I would like to calculate sound speed, c, for each line where:
c(i) = rho(i-1) * c(i-1) * (-1-amp(i)) / rho(i) * (amp(i)-1)
using an initial guess of c=1450.
I have written a for loop that I believe will work, however it gets increasing slower with time, so much so that it is not conceivable to run in it's current format.
Could someone help me speed up this piece of code please?
data=np.genfromtxt('./cmp_twtt_amp_rho')
cmp_no=data[:,[0]]
twtt=data[:,[1]]
amp=data[:,[2]]
rho=data[:,[3]]
cs=[]
for i in range(1,len(amp-1)):
if i == 1:
print "Using an initial guess of 1450 m/s"
c = (rho[i-1]*1450*(-1-amp[i]))/(rho[i]*(1-amp[i]))
cs = np.append(c,cs)
elif twtt[i] == 0:
print "Reached new cmp #: ",cmp_no[i],"as twwt has re-started at ",twtt[i]
c = 1450
cs = np.append(c,cs)
else:
print i
c = (rho[i-1]*cs[i-1]*(-1-amp[i]))/(rho[i]*(1-amp[i]))
cs = np.append(c,cs)
print min(cs), max(cs)
print len(cs)
Upvotes: 0
Views: 1514
Reputation: 280688
Appending to arrays is really slow, since you have to allocate a whole new array every time. Doing it in a loop is pretty much always going to kill performance.
Instead of appending in a loop, or even using a Python-level loop at all, you can get this done much faster with vectorized operations and a cumulative product:
multipliers = rho[:-1] * (-1 - amp[1:]) / (rho[1:] * (1 - amp[1:])
cs = np.cumprod(np.insert(multipliers, 0, 1450))
(insert
also requires allocating an entire new array, but it's okay, since we're only doing it once.)
Also, you probably have a sign error. Your formula says (amp(i) - 1)
and your code says (1 - amp[i])
. I've chosen to match your code, but you might need to correct that.
Upvotes: 2
Reputation: 1386
np.append
has to reallocate the whole array, witch is bad, but not the only problem. You are appending cs
to c
and not the other way around, this means that cs
will be reversed and cs[i-1]
is actually the first c
.
It is better in general to pre-allocate your arrays:
cs = np.zeros(len(amp-1))
Then just set values directly:
cs[i] = c
Something like this should be a bit quicker:
cs=np.zeros(len(amp-1))
print "Using an initial guess of 1450 m/s"
cs[1] = (rho[i-1]*1450*(-1-amp[i]))/(rho[i]*(1-amp[i]))
for i in range(2,len(amp-1)):
if twtt[i] == 0:
print "Reached new cmp #: ",cmp_no[i],"as twwt has re-started at ",twtt[i]
cs[i] = 1450
else:
print i
cs[i] = (rho[i-1]*cs[i-1]*(-1-amp[i]))/(rho[i]*(1-amp[i]))
Upvotes: 0
Reputation: 309929
Numpy arrays aren't really meant to be appended (numpy needs to allocate a full new array each time and copy the old data over). You probably don't want to be doing this in a loop.
It's better to use a data-structure that is meant for this sort of thing -- Generally python's list
handles appending pretty well so I'd suggest you store the data in a list and append to it as you go. Then at the end, if you need the full dataset as an array, you can convert back at that point.
I'd recommend just changing to cs.append(c)
instead of cs = np.append(c, cs)
Upvotes: 1