How to randomly sample from unknown joint distribution in python

Question

I have the following data

import numpy as np
x = np.random.normal(100, 20, 100) # these data point come from normal but they could come from any distribution
y = np.random.normal(110, 20, 100) # these data point come from normal but they could come from any distribution

with the help of plotly-express I can plot their joint distribution

import plotly.express as px
fig = px.density_contour(None, x=x, y=y)
fig.update_traces(contours_coloring="fill", contours_showlabels = True)
fig.show()

I am looking for a way to randomly sample n observations from the distribution shown in the plot above (which is unknown).

How could I do that ?

Daraan · Accepted Answer

Here's a quick way via scikit learn. Hard part is to find hyperparameters that fit your need.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.neighbors import KernelDensity

x = np.random.normal(100, 20, 100) # these data point come from normal but they could come from any distribution
y = np.random.normal(110, 20, 100) 
S = np.vstack((x,y)).T # stack our samples to be of Dx2

kde = KernelDensity(bandwidth=2, rtol=0.01)
kde.fit(S)
new_data = kde.sample(100, random_state=0)

kde = KernelDensity(bandwidth=2, rtol=0.01)
kde.fit(S)

new_data = kde.sample(100, random_state=1)

sns.kdeplot(x=S[:,0], y=S[:,1], cmap="coolwarm", fill=True)
plt.title("Original Distribution")
plt.show()

sns.kdeplot(x=new_data[:,0], y=new_data[:,1], cmap="coolwarm", fill=True)
plt.title("KDE Distribution")
plt.show()

How to randomly sample from unknown joint distribution in python

Answers (1)

Related Questions