quant
quant

Reputation: 4482

How to randomly sample from unknown joint distribution in python

I have the following data

import numpy as np
x = np.random.normal(100, 20, 100) # these data point come from normal but they could come from any distribution
y = np.random.normal(110, 20, 100) # these data point come from normal but they could come from any distribution

with the help of plotly-express I can plot their joint distribution

import plotly.express as px
fig = px.density_contour(None, x=x, y=y)
fig.update_traces(contours_coloring="fill", contours_showlabels = True)
fig.show()

enter image description here

I am looking for a way to randomly sample n observations from the distribution shown in the plot above (which is unknown).

How could I do that ?

Upvotes: 0

Views: 345

Answers (1)

Daraan
Daraan

Reputation: 3780

Here's a quick way via scikit learn. Hard part is to find hyperparameters that fit your need.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.neighbors import KernelDensity

x = np.random.normal(100, 20, 100) # these data point come from normal but they could come from any distribution
y = np.random.normal(110, 20, 100) 
S = np.vstack((x,y)).T # stack our samples to be of Dx2

kde = KernelDensity(bandwidth=2, rtol=0.01)
kde.fit(S)
new_data = kde.sample(100, random_state=0)

kde = KernelDensity(bandwidth=2, rtol=0.01)
kde.fit(S)

new_data = kde.sample(100, random_state=1)

sns.kdeplot(x=S[:,0], y=S[:,1], cmap="coolwarm", fill=True)
plt.title("Original Distribution")
plt.show()

sns.kdeplot(x=new_data[:,0], y=new_data[:,1], cmap="coolwarm", fill=True)
plt.title("KDE Distribution")
plt.show()
    

enter image description here

enter image description here

Upvotes: 1

Related Questions