Gaussian Process that models the uncertainty of the samples, not of the posterior

Question

I have some data and I would like a GP that gives me an approximate distribution of the samples that will arise in the future, not uncertainty over the posterior.

# Parameters for the solar plant generation
days_in_month = 30
hours_per_day = 24
total_hours = days_in_month * hours_per_day

# Create a baseline solar production pattern for a single day
# Let's assume a bell-shaped curve: low production at night, high production during the day
solar_production_day = np.array([0, 0, 0, 0, 0, 10, 20, 40, 50, 60, 75, 85, 90, 85, 75, 60, 50, 40, 20, 10, 0, 0, 0, 0])

# Randomly vary production each day (adding some variability for weather changes)
np.random.seed(42)  # For reproducibility
monthly_solar_production = []
for _ in range(days_in_month):
    monthly_solar_production.extend(solar_production_day * (0.2 + 1.6 * np.random.rand(hours_per_day)))

# Load your solar production data
train_x = torch.cat([torch.linspace(0, 23, 24)] * days_in_month).unsqueeze(-1)
train_y = torch.FloatTensor(monthly_solar_production)  # Solar production

And I train the following model:

# Define the GP model
class SimpleGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(SimpleGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.RBFKernel()
    
    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

# Initialize the Gaussian likelihood with a noise value
likelihood = gpytorch.likelihoods.GaussianLikelihood()
likelihood.noise = torch.tensor(0.1)  # You can adjust this value for the initial noise level

# Define the GP model
model = SimpleGPModel(train_x, train_y, likelihood)

# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the Adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

# Marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

# Training loop
training_iterations = 200
for i in range(training_iterations):
    optimizer.zero_grad()
    output = model(train_x)
    loss = -mll(output, train_y)
    loss.backward()
    optimizer.step()

I can visualise what is going on with the following:

# Switch to evaluation mode
model.eval()
likelihood.eval()

# Test data
test_x = torch.cat([torch.linspace(0, 23, 24)]).unsqueeze(-1)

# Get predictions from the model
with torch.no_grad():
    observed_pred = likelihood(model(test_x))
    mean = observed_pred.mean
    lower, upper = observed_pred.confidence_region()
    variance = observed_pred.variance
print("Variance", variance)

# Plot the results
plt.figure(figsize=(10, 6))
plt.plot(train_x.numpy(), train_y.numpy(), 'k*', label='Observed Data')
plt.plot(test_x.numpy(), mean.numpy(), 'b', label='Predicted Mean')
plt.fill_between(test_x.squeeze().numpy(), lower.numpy(), upper.numpy(), alpha=0.3, label='Confidence Interval')
plt.title("Simple Gaussian Process Model for Solar Production")
plt.xlabel("Hour of the Day")
plt.ylabel("Solar Production")
plt.legend()
plt.show()

I get the following figure:

It is clear in this figure that the uncertainty is measuring the variability in the posterior functions.

Instead, I would like a GP that gives me uncertainty over the samples that will arise in the future, not the mean that governs them. I could compute the variance at each hour and that would give me a predicted distribution of the samples that will appear in the future. But this would only work for this case and I would like my approach to be generalisable.

Gaussian Process that models the uncertainty of the samples, not of the posterior

Answers (0)

Related Questions