Gianluigi
Gianluigi

Reputation: 1

How to bootstrap a value obtained applying a function to a two series of data with Python?

this is my first question. I have written a program with Python to calculate a value starting from two series of data applying a function. The code for obtain the value I need is:

import numpy as np

def calculate_ratio(data1, data2):
    mean_product = np.mean(data1 * data2)
    
    mean_data1 = np.mean(data1)
    mean_data2 = np.mean(data2)
    mean_product_of_means = mean_data1 * mean_data2
    
    numerator = mean_product - mean_product_of_means
    
    mean_squared_data1 = np.mean(data1 ** 2)
    
    squared_mean_data1 = mean_data1 ** 2
    
    denominator = mean_squared_data1 - squared_mean_data1
    
    ratio = numerator / denominator
    return ratio

data1 = np.array([1, 2, 3])
data2 = np.array([4, 5, 6])

result = calculate_ratio(data1, data2)
print(result)

I have to bootstrap this value to calculate the confidence interval using Python.

I don't understand how to do it. I'm very new in programming and I'm trying using chatGPT for help but what it suggests doesn't work. What chatGPT suggests is:


def calculate_ratio(data1, data2):
    mean_product = np.mean(data1 * data2)
    product_of_means = np.mean(data1) * np.mean(data2)
    numerator = mean_product - product_of_means
    mean_squared_data1 = np.mean(data1 ** 2)
    squared_mean_data2 = np.mean(data2) ** 2
    denominator = mean_squared_data1 - squared_mean_data2
    ratio = numerator / denominator
    return ratio

def bootstrap_ratio(data1, data2, func, num_iterations):
    results = []
    n = len(data1)
    for _ in range(num_iterations):
        indices1 = np.random.choice(n, n, replace=True)
        indices2 = np.random.choice(n, n, replace=True)
        data1_bootstrap = data1[indices1]
        data2_bootstrap = data2[indices2]
        result = func(data1_bootstrap, data2_bootstrap)
        results.append(result)
    return results

data1 = np.array([1, 2, 3, 4, 5])
data2 = np.array([5, 4, 3, 2, 1])

num_iterations = 1000

bootstrap_results = bootstrap_ratio(data1, data2, calculate_ratio, num_iterations)

confidence_interval = np.percentile(bootstrap_results, [2.5, 97.5])

print(confidence_interval)

What is wrong with this code? (I have installed numpy) What should I modify for make it work?

Thank you for your attention and sorry if I used uncorrect words.

Upvotes: 0

Views: 83

Answers (2)

Gianluigi
Gianluigi

Reputation: 1

I've solved the problem: the range was incorrect because there were two different indices, so the function was applied between random values of data1 and random values of data2. The function has to operate between random values of data1 and the CORRESPONDENT values of data2. Removing indices1 and indices2 and leaving only indices the range makes sense.

So:

def bootstrap_ratio(data1, data2, func, num_iterations):
results = []
n = len(data1)
for _ in range(num_iterations):
    indices1 = np.random.choice(n, n, replace=True)
    indices2 = np.random.choice(n, n, replace=True)
    data1_bootstrap = data1[indices1]
    data2_bootstrap = data2[indices2]
    result = func(data1_bootstrap, data2_bootstrap)
    results.append(result)
return results

becomes

def bootstrap_ratio(data1, data2, func, num_iterations):
results = []
n = len(data1)
for _ in range(num_iterations):
    indices = np.random.choice(n, n, replace=True)
    data1_bootstrap = data1[indices]
    data2_bootstrap = data2[indices]
    result = func(data1_bootstrap, data2_bootstrap)
    results.append(result)
return results

Upvotes: 0

darren
darren

Reputation: 5744

The question does not look like a bootstrap process (in which each value is computed from earlier values), so the title is a bit misleading in that regard.

The code is describing mathematics which are not shown in the question, so it would be difficult to tell if it was correct.

There is also an error with the print statement due to an accidental = sign here. print = (result)

However, it is very easy to see the values of each line just by adding simple print statements as is done in the modified code below:

import numpy as np

def calculate_ratio(data1, data2):
    mean_product = np.mean(data1 * data2)
    
    mean_data1 = np.mean(data1)
    mean_data2 = np.mean(data2)
    mean_product_of_means = mean_data1 * mean_data2
    
    numerator = mean_product - mean_product_of_means
    print('numerator:', numerator)
    mean_squared_data1 = np.mean(data1 ** 2)
    
    squared_mean_data1 = mean_data1 ** 2
    print('mean_squared_data1:', mean_squared_data1)
    print('squared_mean_data1:', squared_mean_data1)

    denominator = mean_squared_data1 - squared_mean_data1
    print('denominator:', denominator)
    ratio = numerator / denominator
    return ratio

data1 = np.array([1, 2, 3])
data2 = np.array([4, 5, 6])

result = calculate_ratio(data1, data2)
print('the result is:', result)

which returns this:

numerator: 0.6666666666666661
mean_squared_data1: 4.666666666666667
squared_mean_data1: 4.0
denominator: 0.666666666666667
the result is: 0.9999999999999987

One can now rationalise the calculations according to what they might have expected from the equations.

Upvotes: 0

Related Questions