How to fix the poor fitting of 1-D data?

Question

I have data set (1-D), with only one independent column. I would like to fit any model to it in order to sample from that model. The raw data Data set

I tried various theoretical distributions from Fitter package (here https://pypi.org/project/fitter/), none of them works fine. Then i tried Kernel Density Estimation using sklearn. It is good, but i could not prevent negative values due to the way it works. Finally, i tried log normal, but it is not really perfect.

Code for log normal here

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy
import math
from sklearn.metrics import  r2_score,mean_absolute_error,mean_squared_error

NN = 3915 # sample same number as original data set

df = pd.read_excel (r'Data_sets2.xlsx',sheet_name="Set1")

eps = 0.1  # Additional term for c

"""
    Estimate parameters of log(c) as normal distribution
"""
df["c"] = df["c"] + eps
mu = np.mean(np.log(df["c"]))
s  = np.std(np.log(df["c"]))
print("Mean:",mu,"std:",s)


def simulate(N):
    c = []
    for i in range(N):
        c_s = np.exp(np.random.normal(loc = mu, scale = s, size=1)[0])
        c.append(round(c_s))
    return (c)


predicted_c = simulate(NN)


XX=scipy.arange(3915)
### plot C relation ###
plt.scatter(XX,df["c"],color='g',label="Original data")
plt.scatter(XX,predicted_c,color='r',label="Sample data")
plt.xlabel('Index')
plt.ylabel('c')
plt.legend()
plt.show()

original vs samples

What i am looking for is how to improve the fitting, any suggestions or direction to models that may fit my data with a better accuracy is appreciated. Thanks

How to fix the poor fitting of 1-D data?

Answers (1)

Related Questions