Reputation: 1
this is my first question. I have written a program with Python to calculate a value starting from two series of data applying a function. The code for obtain the value I need is:
import numpy as np
def calculate_ratio(data1, data2):
mean_product = np.mean(data1 * data2)
mean_data1 = np.mean(data1)
mean_data2 = np.mean(data2)
mean_product_of_means = mean_data1 * mean_data2
numerator = mean_product - mean_product_of_means
mean_squared_data1 = np.mean(data1 ** 2)
squared_mean_data1 = mean_data1 ** 2
denominator = mean_squared_data1 - squared_mean_data1
ratio = numerator / denominator
return ratio
data1 = np.array([1, 2, 3])
data2 = np.array([4, 5, 6])
result = calculate_ratio(data1, data2)
print(result)
I have to bootstrap this value to calculate the confidence interval using Python.
I don't understand how to do it. I'm very new in programming and I'm trying using chatGPT for help but what it suggests doesn't work. What chatGPT suggests is:
def calculate_ratio(data1, data2):
mean_product = np.mean(data1 * data2)
product_of_means = np.mean(data1) * np.mean(data2)
numerator = mean_product - product_of_means
mean_squared_data1 = np.mean(data1 ** 2)
squared_mean_data2 = np.mean(data2) ** 2
denominator = mean_squared_data1 - squared_mean_data2
ratio = numerator / denominator
return ratio
def bootstrap_ratio(data1, data2, func, num_iterations):
results = []
n = len(data1)
for _ in range(num_iterations):
indices1 = np.random.choice(n, n, replace=True)
indices2 = np.random.choice(n, n, replace=True)
data1_bootstrap = data1[indices1]
data2_bootstrap = data2[indices2]
result = func(data1_bootstrap, data2_bootstrap)
results.append(result)
return results
data1 = np.array([1, 2, 3, 4, 5])
data2 = np.array([5, 4, 3, 2, 1])
num_iterations = 1000
bootstrap_results = bootstrap_ratio(data1, data2, calculate_ratio, num_iterations)
confidence_interval = np.percentile(bootstrap_results, [2.5, 97.5])
print(confidence_interval)
What is wrong with this code? (I have installed numpy) What should I modify for make it work?
Thank you for your attention and sorry if I used uncorrect words.
Upvotes: 0
Views: 83
Reputation: 1
I've solved the problem: the range was incorrect because there were two different indices, so the function was applied between random values of data1 and random values of data2. The function has to operate between random values of data1 and the CORRESPONDENT values of data2. Removing indices1 and indices2 and leaving only indices the range makes sense.
So:
def bootstrap_ratio(data1, data2, func, num_iterations):
results = []
n = len(data1)
for _ in range(num_iterations):
indices1 = np.random.choice(n, n, replace=True)
indices2 = np.random.choice(n, n, replace=True)
data1_bootstrap = data1[indices1]
data2_bootstrap = data2[indices2]
result = func(data1_bootstrap, data2_bootstrap)
results.append(result)
return results
becomes
def bootstrap_ratio(data1, data2, func, num_iterations):
results = []
n = len(data1)
for _ in range(num_iterations):
indices = np.random.choice(n, n, replace=True)
data1_bootstrap = data1[indices]
data2_bootstrap = data2[indices]
result = func(data1_bootstrap, data2_bootstrap)
results.append(result)
return results
Upvotes: 0
Reputation: 5744
The question does not look like a bootstrap process (in which each value is computed from earlier values), so the title is a bit misleading in that regard.
The code is describing mathematics which are not shown in the question, so it would be difficult to tell if it was correct.
There is also an error with the print statement due to an accidental =
sign here. print = (result)
However, it is very easy to see the values of each line just by adding simple print statements as is done in the modified code below:
import numpy as np
def calculate_ratio(data1, data2):
mean_product = np.mean(data1 * data2)
mean_data1 = np.mean(data1)
mean_data2 = np.mean(data2)
mean_product_of_means = mean_data1 * mean_data2
numerator = mean_product - mean_product_of_means
print('numerator:', numerator)
mean_squared_data1 = np.mean(data1 ** 2)
squared_mean_data1 = mean_data1 ** 2
print('mean_squared_data1:', mean_squared_data1)
print('squared_mean_data1:', squared_mean_data1)
denominator = mean_squared_data1 - squared_mean_data1
print('denominator:', denominator)
ratio = numerator / denominator
return ratio
data1 = np.array([1, 2, 3])
data2 = np.array([4, 5, 6])
result = calculate_ratio(data1, data2)
print('the result is:', result)
which returns this:
numerator: 0.6666666666666661
mean_squared_data1: 4.666666666666667
squared_mean_data1: 4.0
denominator: 0.666666666666667
the result is: 0.9999999999999987
One can now rationalise the calculations according to what they might have expected from the equations.
Upvotes: 0