Zen
Zen

Reputation: 157

Fit polynomial to point cloud in 3D using PolynomialFeatures

I have some points which represent movement of some particle in 3D. I am trying to fit polynomial to these points so that I can have one line representing the track the particle has taken. Moreover, it is clear that the track is either polynomial degree 2 or 3.

Following a suggestion here I have written the following code to fit polynomial to my data:

%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Point cloud
data = np.array([[ 41. ,  57. ,  92. ],[ 39. ,  57.  , 92.4],[ 43. ,  57.  , 91.2], [ 23.,   47. , 119.6],
                 [ 27. ,  47. , 115.2], [ 25. ,  45. , 122. ], [ 25. ,  49. , 114. ],[ 29.,   49. , 109.6],
                 [ 29. ,  47. , 114.4], [ 27. ,  49. , 111.2], [ 23. ,  45. , 125.6], [ 31.,   49.,  106.8],
                 [ 25. ,  47. , 117.6], [ 39. ,  55. ,  95.6],[ 37.  , 53.  , 98.4], [ 35. ,  55. ,  96.8],
                 [ 33. ,  53. , 116.8], [ 23. ,  43. , 132.8], [ 25. ,  41. , 145.2],[ 25. ,  43.,  133.6],
                 [ 29. ,  51. , 106.4],[ 31.  , 53. , 121.2],[ 31., 51. , 104.8],[ 41.,   55.,   93.6],
                 [ 33. ,  51. , 103.6],[ 35.  , 53. ,  99.6],[ 37. ,  55. ,  96.4]])

x = data[:,0]
y = data[:,1]
z = data[:,2]

# sort data to avoid plotting problems
x, y, z = zip(*sorted(zip(x, y, z)))

x = np.array(x)
y = np.array(y)
z = np.array(z)

data_xy = np.array([x,y])

poly = PolynomialFeatures(degree=2)
X_t = poly.fit_transform(data_xy.transpose())

clf = LinearRegression()
clf.fit(X_t, z)
z_pred = clf.predict(X_t)
print(clf.coef_)
print(clf.intercept_)

fig = plt.figure()
ax = plt.subplot(projection='3d')
ax.plot(x, y, z_pred, 'r') # fit line
ax.scatter(x, y, z)
fig.set_dpi(150)

The problem is, I am getting a very strange result:

enter image description here

Any idea what's going on?

EDIT: I would expect a line that fits to the data. For example, I applied the exact same method to different data, and this is what I got:

It is also possible to suggest other methods. Thank you!

Upvotes: 2

Views: 2998

Answers (1)

Zen
Zen

Reputation: 157

I was able to solve the problem. Actually my choice of dependent and independent variables is the issue here. If I assume X to be independent variable and Y and Z are dependent then I get better results:

from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

def polynomial_regression3d(x, y, z, degree):
    # sort data to avoid plotting problems
    x, y, z = zip(*sorted(zip(x, y, z)))

    x = np.array(x)
    y = np.array(y)
    z = np.array(z)
    
    data_yz = np.array([y,z])
    data_yz = data_yz.transpose()

    polynomial_features= PolynomialFeatures(degree=degree)
    x_poly = polynomial_features.fit_transform(x[:, np.newaxis])

    model = LinearRegression()
    model.fit(x_poly, data_yz)
    y_poly_pred = model.predict(x_poly)

    rmse = np.sqrt(mean_squared_error(data_yz,y_poly_pred))
    r2 = r2_score(data_yz,y_poly_pred)
    print("RMSE:", rmse)
    print("R-squared", r2)
    
    # plot
    fig = plt.figure()
    ax = plt.axes(projection='3d')
    ax.scatter(x, data_yz[:,0], data_yz[:,1])
    ax.plot(x, y_poly_pred[:,0], y_poly_pred[:,1], color='r')
    ax.set_xlabel('X')
    ax.set_ylabel('Y')
    ax.set_zlabel('Z')

    plt.show()
    fig.set_dpi(150)

However, when I compute R-squared, it shows a value of 0.8, which is not so bad but maybe could be improved? Perhaps, using weighted least squares can improve that.

Upvotes: 2

Related Questions