Reputation: 3244
I'm trying to interpolate my set of data (first columnt is the time, third columnt is the actual data):
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
data = np.genfromtxt("data.csv", delimiter=" ")
x = data[:, 0]
y = data[:, 2]
xx = np.linspace(x.min(), x.max(), 1000)
y_smooth = interp1d(x, y)(xx)
#y_smooth = interp1d(x, y, kind="cubic")(xx)
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(xx, y_smooth, "r-")
plt.show()
but I see some strange difference between linear and cubic interpolation.
Here is the result for linear:
Here is the same for cubic:
I'm not sure, why is graph jumping all the time and y_smooth
contains incorrect values?
ipdb> y_smooth_linear.max()
141.5481144
ipdb> y_smooth_cubic.max()
1.2663431888584225e+18
Can anybody explain to me, how can I change my code to achieve correct interpolation?
UPD: here is data.cvs
file
Upvotes: 2
Views: 1924
Reputation: 879869
Given cfh's observation that x
has duplicate values, you could use np.unique
to select a unique value of y
for each x
:
x2, idx = np.unique(x, return_index=True)
y2 = y[idx]
return_index=True
causes np.unique to return not only the unique values, x2
, but also the locations, idx
, of the unique x
s in the original x
array. Note that this selects the first value of y
for each unique x
.
If you'd like to average all the y
values for each unique x
, you could use
stats.binned_statistic:
import scipy.stats as stats
x2, inv = np.unique(x, return_inverse=True)
y2, bin_edges, binnumber = stats.binned_statistic(
x=inv, values=y, statistic='mean', bins=inv.max()+1)
return_inverse=True
tells np.unique
to return indices from which the
original array can be reconstructed. Those indices can also serve as categorical
labels or "factors", which is how they are being used in the call to
binned_statistic
above.
import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import scipy.stats as stats
data = np.genfromtxt("data.csv", delimiter=" ")
x = data[:, 0]
y = data[:, 1]
x2, idx, inv = np.unique(x, return_index=True, return_inverse=True)
y_uniq = y[idx]
y_ave, bin_edges, binnumber = stats.binned_statistic(
x=inv, values=y, statistic='mean', bins=inv.max()+1)
xx = np.linspace(x.min(), x.max(), 1000)
y_smooth = interp1d(x, y)(xx)
y_smooth2 = interp1d(x2, y_uniq, kind="cubic")(xx)
y_smooth3 = interp1d(x2, y_ave, kind="cubic")(xx)
fig, ax = plt.subplots(nrows=3, sharex=True)
ax[0].plot(xx, y_smooth, "r-", label='linear')
ax[1].plot(xx, y_smooth2, "b-", label='cubic (first y)')
ax[2].plot(xx, y_smooth3, "b-", label='cubic (ave y)')
ax[0].legend(loc='best')
ax[1].legend(loc='best')
ax[2].legend(loc='best')
plt.show()
Upvotes: 2
Reputation: 4666
Your data contains several y values for the same x value. This violates the assumptions of most interpolation algorithms.
Either discard the rows with duplicate x values, average the y values for each individual x, or obtain a better resolution for the x values such that they aren't the same anymore.
Upvotes: 4