Reputation: 1
I am using the TD3
algorithm for scheduling the charging and discharging of electric vehicles (EVs), considering time-of-use electricity pricing. I have set up three reward values: r_anx, r_dep, and r_price. The training results are shown in the attached figure. I aim to minimize the cost (i.e., increase the charging power during low electricity price periods and decrease it or discharge during high price periods) while ensuring that r_dep is as close to zero as possible. How can I achieve this?
enter image description here
def calculateReward(self, action, kp=5, kx=10, kd=35, ks=100):
price = self.origin_data[self.t_index][1]
t_now = 24 if self.t % 24 == 0 else self.t % 24
t_anx = 24 if self.t_x % 24 == 0 else self.t_x % 24
t_dep = 24 if self.t_d % 24 == 0 else self.t_d % 24
r_anx = np.ndarray(shape=(1,), buffer=np.array([0.0]))
r_price = np.ndarray(shape=(1,), buffer=np.array([0.0]))
r_dep = np.ndarray(shape=(1,), buffer=np.array([0.0]))
r_soc = np.ndarray(shape=(1,), buffer=np.array([0.0]))
r_de_penalty = np.ndarray(shape=(1,), buffer=np.array([0.0]))
if t_now < self.t_a:
t_now += 24
if t_dep < self.t_a:
t_dep += 24
if t_anx < self.t_a:
t_anx += 24
if t_now < t_anx:
r_price = -kp * (action * price)
elif t_now + 1 == t_dep:
r_dep = -kd * max((self.soc_d - self.soc),0)
r_price = -kp * (action * price)
elif t_anx <= t_now < t_dep:
r_price = -kp * (action * price)
r_anx = -kx * max((self.soc_x - self.soc), 0)
if self.soc > 1:
r_soc = -ks
if self.soc < 0:
r_soc = -ks
r = r_price + r_anx + r_dep + r_soc + r_de_penalty
return r, r_anx, r_price, r_dep
I want to make the trained agent minimize the cost for EV charging and discharging while ensuring that r_dep is as close to zero as possible.
Upvotes: 0
Views: 18