Reputation: 125
I have some data and I would like a GP that gives me an approximate distribution of the samples that will arise in the future, not uncertainty over the posterior.
# Parameters for the solar plant generation
days_in_month = 30
hours_per_day = 24
total_hours = days_in_month * hours_per_day
# Create a baseline solar production pattern for a single day
# Let's assume a bell-shaped curve: low production at night, high production during the day
solar_production_day = np.array([0, 0, 0, 0, 0, 10, 20, 40, 50, 60, 75, 85, 90, 85, 75, 60, 50, 40, 20, 10, 0, 0, 0, 0])
# Randomly vary production each day (adding some variability for weather changes)
np.random.seed(42) # For reproducibility
monthly_solar_production = []
for _ in range(days_in_month):
monthly_solar_production.extend(solar_production_day * (0.2 + 1.6 * np.random.rand(hours_per_day)))
# Load your solar production data
train_x = torch.cat([torch.linspace(0, 23, 24)] * days_in_month).unsqueeze(-1)
train_y = torch.FloatTensor(monthly_solar_production) # Solar production
And I train the following model:
# Define the GP model
class SimpleGPModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(SimpleGPModel, self).__init__(train_x, train_y, likelihood)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module = gpytorch.kernels.RBFKernel()
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
# Initialize the Gaussian likelihood with a noise value
likelihood = gpytorch.likelihoods.GaussianLikelihood()
likelihood.noise = torch.tensor(0.1) # You can adjust this value for the initial noise level
# Define the GP model
model = SimpleGPModel(train_x, train_y, likelihood)
# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the Adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
# Marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
# Training loop
training_iterations = 200
for i in range(training_iterations):
optimizer.zero_grad()
output = model(train_x)
loss = -mll(output, train_y)
loss.backward()
optimizer.step()
I can visualise what is going on with the following:
# Switch to evaluation mode
model.eval()
likelihood.eval()
# Test data
test_x = torch.cat([torch.linspace(0, 23, 24)]).unsqueeze(-1)
# Get predictions from the model
with torch.no_grad():
observed_pred = likelihood(model(test_x))
mean = observed_pred.mean
lower, upper = observed_pred.confidence_region()
variance = observed_pred.variance
print("Variance", variance)
# Plot the results
plt.figure(figsize=(10, 6))
plt.plot(train_x.numpy(), train_y.numpy(), 'k*', label='Observed Data')
plt.plot(test_x.numpy(), mean.numpy(), 'b', label='Predicted Mean')
plt.fill_between(test_x.squeeze().numpy(), lower.numpy(), upper.numpy(), alpha=0.3, label='Confidence Interval')
plt.title("Simple Gaussian Process Model for Solar Production")
plt.xlabel("Hour of the Day")
plt.ylabel("Solar Production")
plt.legend()
plt.show()
It is clear in this figure that the uncertainty is measuring the variability in the posterior functions.
Instead, I would like a GP that gives me uncertainty over the samples that will arise in the future, not the mean that governs them. I could compute the variance at each hour and that would give me a predicted distribution of the samples that will appear in the future. But this would only work for this case and I would like my approach to be generalisable.
Upvotes: 0
Views: 8