Reputation: 157
I have some points which represent movement of some particle in 3D. I am trying to fit polynomial to these points so that I can have one line representing the track the particle has taken. Moreover, it is clear that the track is either polynomial degree 2 or 3.
Following a suggestion here I have written the following code to fit polynomial to my data:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
# Point cloud
data = np.array([[ 41. , 57. , 92. ],[ 39. , 57. , 92.4],[ 43. , 57. , 91.2], [ 23., 47. , 119.6],
[ 27. , 47. , 115.2], [ 25. , 45. , 122. ], [ 25. , 49. , 114. ],[ 29., 49. , 109.6],
[ 29. , 47. , 114.4], [ 27. , 49. , 111.2], [ 23. , 45. , 125.6], [ 31., 49., 106.8],
[ 25. , 47. , 117.6], [ 39. , 55. , 95.6],[ 37. , 53. , 98.4], [ 35. , 55. , 96.8],
[ 33. , 53. , 116.8], [ 23. , 43. , 132.8], [ 25. , 41. , 145.2],[ 25. , 43., 133.6],
[ 29. , 51. , 106.4],[ 31. , 53. , 121.2],[ 31., 51. , 104.8],[ 41., 55., 93.6],
[ 33. , 51. , 103.6],[ 35. , 53. , 99.6],[ 37. , 55. , 96.4]])
x = data[:,0]
y = data[:,1]
z = data[:,2]
# sort data to avoid plotting problems
x, y, z = zip(*sorted(zip(x, y, z)))
x = np.array(x)
y = np.array(y)
z = np.array(z)
data_xy = np.array([x,y])
poly = PolynomialFeatures(degree=2)
X_t = poly.fit_transform(data_xy.transpose())
clf = LinearRegression()
clf.fit(X_t, z)
z_pred = clf.predict(X_t)
print(clf.coef_)
print(clf.intercept_)
fig = plt.figure()
ax = plt.subplot(projection='3d')
ax.plot(x, y, z_pred, 'r') # fit line
ax.scatter(x, y, z)
fig.set_dpi(150)
The problem is, I am getting a very strange result:
Any idea what's going on?
EDIT: I would expect a line that fits to the data. For example, I applied the exact same method to different data, and this is what I got:
It is also possible to suggest other methods. Thank you!
Upvotes: 2
Views: 2998
Reputation: 157
I was able to solve the problem. Actually my choice of dependent and independent variables is the issue here. If I assume X to be independent variable and Y and Z are dependent then I get better results:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
def polynomial_regression3d(x, y, z, degree):
# sort data to avoid plotting problems
x, y, z = zip(*sorted(zip(x, y, z)))
x = np.array(x)
y = np.array(y)
z = np.array(z)
data_yz = np.array([y,z])
data_yz = data_yz.transpose()
polynomial_features= PolynomialFeatures(degree=degree)
x_poly = polynomial_features.fit_transform(x[:, np.newaxis])
model = LinearRegression()
model.fit(x_poly, data_yz)
y_poly_pred = model.predict(x_poly)
rmse = np.sqrt(mean_squared_error(data_yz,y_poly_pred))
r2 = r2_score(data_yz,y_poly_pred)
print("RMSE:", rmse)
print("R-squared", r2)
# plot
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.scatter(x, data_yz[:,0], data_yz[:,1])
ax.plot(x, y_poly_pred[:,0], y_poly_pred[:,1], color='r')
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
plt.show()
fig.set_dpi(150)
However, when I compute R-squared, it shows a value of 0.8, which is not so bad but maybe could be improved? Perhaps, using weighted least squares can improve that.
Upvotes: 2