Reputation: 4482
I have the following data
import numpy as np
x = np.random.normal(100, 20, 100) # these data point come from normal but they could come from any distribution
y = np.random.normal(110, 20, 100) # these data point come from normal but they could come from any distribution
with the help of plotly-express
I can plot their joint distribution
import plotly.express as px
fig = px.density_contour(None, x=x, y=y)
fig.update_traces(contours_coloring="fill", contours_showlabels = True)
fig.show()
I am looking for a way to randomly sample n
observations from the distribution shown in the plot above (which is unknown).
How could I do that ?
Upvotes: 0
Views: 345
Reputation: 3780
Here's a quick way via scikit learn. Hard part is to find hyperparameters that fit your need.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.neighbors import KernelDensity
x = np.random.normal(100, 20, 100) # these data point come from normal but they could come from any distribution
y = np.random.normal(110, 20, 100)
S = np.vstack((x,y)).T # stack our samples to be of Dx2
kde = KernelDensity(bandwidth=2, rtol=0.01)
kde.fit(S)
new_data = kde.sample(100, random_state=0)
kde = KernelDensity(bandwidth=2, rtol=0.01)
kde.fit(S)
new_data = kde.sample(100, random_state=1)
sns.kdeplot(x=S[:,0], y=S[:,1], cmap="coolwarm", fill=True)
plt.title("Original Distribution")
plt.show()
sns.kdeplot(x=new_data[:,0], y=new_data[:,1], cmap="coolwarm", fill=True)
plt.title("KDE Distribution")
plt.show()
Upvotes: 1