erthalion
erthalion

Reputation: 3244

scipy: interpolation, cubic & linear

I'm trying to interpolate my set of data (first columnt is the time, third columnt is the actual data):

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d

data = np.genfromtxt("data.csv", delimiter=" ")

x = data[:, 0]
y = data[:, 2]

xx = np.linspace(x.min(), x.max(), 1000)
y_smooth = interp1d(x, y)(xx)
#y_smooth = interp1d(x, y, kind="cubic")(xx)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(xx, y_smooth, "r-")

plt.show()

but I see some strange difference between linear and cubic interpolation. Here is the result for linear: enter image description here

Here is the same for cubic: enter image description here

I'm not sure, why is graph jumping all the time and y_smooth contains incorrect values?

ipdb> y_smooth_linear.max()
141.5481144
ipdb> y_smooth_cubic.max()
1.2663431888584225e+18

Can anybody explain to me, how can I change my code to achieve correct interpolation?

UPD: here is data.cvs file

Upvotes: 2

Views: 1924

Answers (2)

unutbu
unutbu

Reputation: 879869

Given cfh's observation that x has duplicate values, you could use np.unique to select a unique value of y for each x:

x2, idx = np.unique(x, return_index=True)
y2 = y[idx]

return_index=True causes np.unique to return not only the unique values, x2, but also the locations, idx, of the unique xs in the original x array. Note that this selects the first value of y for each unique x.

If you'd like to average all the y values for each unique x, you could use stats.binned_statistic:

import scipy.stats as stats
x2, inv = np.unique(x, return_inverse=True)
y2, bin_edges, binnumber = stats.binned_statistic(
    x=inv, values=y, statistic='mean', bins=inv.max()+1)

return_inverse=True tells np.unique to return indices from which the original array can be reconstructed. Those indices can also serve as categorical labels or "factors", which is how they are being used in the call to binned_statistic above.


import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import interp1d
import scipy.stats as stats

data = np.genfromtxt("data.csv", delimiter=" ")

x = data[:, 0]
y = data[:, 1]

x2, idx, inv = np.unique(x, return_index=True, return_inverse=True)
y_uniq = y[idx]
y_ave, bin_edges, binnumber = stats.binned_statistic(
    x=inv, values=y, statistic='mean', bins=inv.max()+1)

xx = np.linspace(x.min(), x.max(), 1000)
y_smooth = interp1d(x, y)(xx)
y_smooth2 = interp1d(x2, y_uniq, kind="cubic")(xx)
y_smooth3 = interp1d(x2, y_ave, kind="cubic")(xx)

fig, ax = plt.subplots(nrows=3, sharex=True)

ax[0].plot(xx, y_smooth, "r-", label='linear')
ax[1].plot(xx, y_smooth2, "b-", label='cubic (first y)')
ax[2].plot(xx, y_smooth3, "b-", label='cubic (ave y)')
ax[0].legend(loc='best')
ax[1].legend(loc='best')
ax[2].legend(loc='best')
plt.show()

enter image description here

Upvotes: 2

cfh
cfh

Reputation: 4666

Your data contains several y values for the same x value. This violates the assumptions of most interpolation algorithms.

Either discard the rows with duplicate x values, average the y values for each individual x, or obtain a better resolution for the x values such that they aren't the same anymore.

Upvotes: 4

Related Questions