Justin Chapweske
Justin Chapweske

Reputation: 43

Bandit-like Algorithm to Optimize Parameters?

I need an algorithm to optimize the time of the week that I show a message to a user to ensure the highest probability that the user will click the message.

When the message is shown, a database entry will be updated with the day/time and whether or not the user clicked. The goal is to maximize the click through rate.

I am very comfortable with using Bayesian Bandits (also known as Thompson Sampling) (https://github.com/omphalos/bayesian-bandit.js) to optimize in the case of N discrete parameters, but I am at a loss on how to apply this to a continuous value.

I'm well aware of standard hill climbing algorithms, but I only understand how to apply hill-climbing in the absence of statistical noise. Is there a simple bayesian way to do hill climbing that optimizes the exploration/exploitation tradeoff?

For bonus points, is there a method that can be generalized to multiple dimensions, so optimizing multiple parameters simultaneously to find optimal points in a multi-dimensional space?

Upvotes: 1

Views: 730

Answers (2)

Ben Allison
Ben Allison

Reputation: 7394

I suggest you view the reward function as a Gaussian Process to make this nice and Bayesian in the presence of continuous parameters. Essentially you have a regression problem where payoff(t) is a function to be estimated for continuous t, and you want a strategy for picking values of t which trades off exploration (regions of function space with high posterior variance) with exploitation (regions of function space with high expectation).

There's prior work on this, for example this paper and other works by the author.

Upvotes: 2

Pradhan
Pradhan

Reputation: 16737

Closely related to Bayesian bandits are Bayesian mixture models. You can think of a Bayesian bandit as being a Bayesian mixture of delta functions. This removes the discreteness constraint. Instead, you can model the distribution over your continuous space as a sum of continuous valued random variables. For example, you could suppose that there are 5 "click sources", each with a normal distribution around the hour(8am, 9am,....), and that the standard deviation is 15 minutes. So, when you get a click at 8:05, you will attribute it heavily to the 8am model, a lesser amount to the 9am one, an even lesser amount to the 10am one and so on.

A commonly used algorithm to estimate mixture models is Expectation-Maximization. You should be able to find good open source implementations. Note that the above description(and EM) continues to hold in the multi-variate case.

Upvotes: 4

Related Questions