Luca
Luca

Reputation: 10996

visualizing high dimension data in matplotlib/python

I am trying to use Gaussian Processes for fitting smooth functions to some datapoints. I am using scikit-learn library for python and in my case my input are two dimensional spatial coordinates and the output are some transformed version and also 2-D spatial coordinates. I generated some dummy test data and tried to fit a GP model to it. The code that I used was as follows:

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
import numpy as np

# Some dummy data
X = np.random.rand(10, 2)
Y = np.sin(X)

# Use the squared exponential kernel
kernel = C(1.0, (1e-3, 1e3)) * RBF(10, (1e-2, 1e2))
gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=9)
# Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, Y)
print(X)
# Evaluate on a test point
test = np.random.rand(1, 2)
test[:, 0] = 1.56
test[:, 1] = 0.92
y_pred, sigma = gp.predict(test, return_std=True)
print(test, np.sin(test))  # The true value
print(y_pred, sigma)  # The predicted value and the STD

I was wondering if there is a good way to visualize the model fit. As my input and output dimensions are both 2-D, I am not sure how I can visualize it quickly so that I get an idea of the model fit (particularly want to know the smoothness and variance of the model prediction between the points). Most examples online are, of course, for 1-D case.

Upvotes: 0

Views: 1042

Answers (1)

Pratik Kumar
Pratik Kumar

Reputation: 2231

I assume what you need is Principal Component Analysis(PCA) which is a statistical technique to reduce the dimension of a dataset while preserving their variance in the high dimension to the low dimension.

In python :

from sklearn.decomposition import PCA

pca_x=PCA(n_components=1)
X1D=pca.fit_transform(X)

pca_y=PCA(n_components=1)
y1D=pca.fit_transform(y)

plt.plot(X1D,y1D)

n_components=d where d is the required reduced dimension

link to PCA in sklearn -->here

An alternative could be t-distributed Stochastic Neighbor Embedding in short t-sne which is also used to visualize high dimensional data, find python implementation here

Upvotes: 1

Related Questions