Simulating correlated multivariate data

Question

I'm trying to generate synthetic realizations from historical hurricane data. A hurricane is parameterized in my problem by a set of descriptors (i.e. storm size, storm intensity, storm speed, and storm heading - all referenced to the values at the time the hurricane crosses some shoreline). The realizations will be used to make probabilistic forecasts of hurricane-generated flooding. The assumption is that the historical hurricane data comes from some underlying multivariate distribution. The idea is to draw additional samples from this underlying distribution (preserving moments, correlation, physical bounds such as positive storm size, etc).

I've implemented a nearest neighbor Gaussian dispersion method modified from a technique developed by Taylor and Thompson - published in Computational Statistics and Data Analysis, 1986. I'd like to see if there are better ways to do this.

Data sample (Gulf of Mexico hurricanes 1940-2005):

Gulf of Mexico hurricanes 1940-2005

def TT_alg(data_list, sample_size, num_neighbors=5, metric=2):

    dummy_list = []
    dimension = len(data_list[0])

    # transform the data to the interval [0,1]
    aa = numpy.array([(max([row[i] for row in data_list]) - min([row[i] for row in   data_list])) for i in range(dimension)])
    bb = numpy.array([min([row[j] for row in data_list]) for j in range(dimension)])

    data_array = numpy.array(data_list)
    data_array_normed = (data_array - bb) / aa

    # setup nearest neighbor tree
    tree = scipy.spatial.KDTree(data_array_normed)

    # perform nearest neighbor random walk
    for ijk in range(sample_size):

        sample = random.choice(data_array_normed)

        kNN = tree.query(sample, k=num_neighbors, p=metric)
        x_mu = numpy.array([numpy.average([data_array_normed[i][j] for i in kNN[1]]) for j in range(dimension)])
        x_si = numpy.array([numpy.std([data_array_normed[i][j] for i in kNN[1]]) for j in range(dimension)])
        s_gs = [numpy.random.normal(mu, si) for mu, si in zip(x_mu, x_si)]
        dummy_list.append(s_gs)

    dummy_array = numpy.array(dummy_list)

    # go back to original scale
    data_array_unnormed = (dummy_array * aa) + bb

    return data_array_unnormed.tolist()

Example for neighborhood_size=5 and distance_metric=Euclidean.

enter image description here

Simulating correlated multivariate data

Answers (1)

Related Questions