How should I code the Gambler's Problem with Q-learning (without any reinforcement learning packages)?

Question

I would like to solve the Gambler's problem as an MDP (Markov Decision Process).

Gambler's problem: A gambler has the opportunity to make bets on the outcomes of a sequence of coin flips. If the coin comes up heads, he wins as many dollars as he has staked on that flip; if it is tails, he loses his stake. The game ends when the gambler wins by reaching his goal of κ dollars, or loses by running out of money. On each flip, the gambler must decide how many (integer) dollars to stake. The probability of heads is p and that of tails is 1 − p.

I implemented the modell-free Q-learning method using a totally random base policy. But the code is not working as I hoped and I can't figure out why. Thank you for any suggestions. :)

import numpy as np
import numpy as np
import matplotlib.pyplot as plt
import random

#data
kappa=100 #goal
p=0.25  #probability of the head, winning
eps=0.1 #0.1, 0.005 epsilon
gamma=0.9 #discount factor
alpha=0.1 # 0.1, 1, 10 learning rate
n=1000 #number of training episodes

#Q-learning with totally random base policy
S = [*range(0,kappa+1)] 
A = [*range(0,kappa+1)]

R=np.zeros((kappa+1,kappa+1))
for i in A:
    R[kappa,i]=1

Q=np.zeros((kappa+1,kappa+1))
optimal_policy=np.zeros(kappa+1)

for sa in range(1,kappa):
    i=0
    while i

How should I code the Gambler's Problem with Q-learning (without any reinforcement learning packages)?

Answers (1)

Related Questions

How should I code the Gambler&#39;s Problem with Q-learning (without any reinforcement learning packages)?

Answers (1)

Related Questions

How should I code the Gambler's Problem with Q-learning (without any reinforcement learning packages)?