Reputation: 1
I have a system where a car's position is adjusted using PID control, and the car moves along a road with several measurement points (x, y coordinates). I want to improve the precision of the PID controller using a reinforcement learning (RL) agent. I referred to the MATLAB example "Tune PI Controller Using Reinforcement Learning" for guidance. The challenge I’m facing now is how to implement my system in Simulink. The model training should follow a similar approach to the example, but I am unsure how to structure the Simulink model to integrate the RL agent for controlling the PID parameters. Any advice or steps to help with this integration would be much appreciated!
Here is the code and model I refer to and how I should modify it.
Reinforcement learning model of water tank system
open_system('watertankLQG');
Ts = 0.1;
Tf = 10;
controlSystemTuner("ControlSystemTunerSession");
Kp_CST = 9.80199999804512;
Ki_CST = 1.00019996230706e-06;
mdl = 'rlwatertankPIDTune';
open_system(mdl);
[env,obsInfo,actInfo] = localCreatePIDEnv(mdl);
numObs = prod(obsInfo.Dimension);
numAct = prod(actInfo.Dimension);
rng(0);
initialGain = single([1e-3 2]);
actorNet = [
featureInputLayer(numObs)
fullyConnectedPILayer(initialGain,'ActOutLyr')
];
actorNetGraph = layerGraph(actorNet);
actorNet = dlnetwork(actorNetGraph);
actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
criticNet = localCreateCriticNetwork(numObs,numAct);
criticNetGraph = layerGraph(criticNet);
critic1 = rlQValueFunction(dlnetwork(criticNetGraph), ...
obsInfo,actInfo,...
ObservationInputNames='stateInLyr', ...
ActionInputNames='actionInLyr');
critic2 = rlQValueFunction(dlnetwork(criticNetGraph), ...
obsInfo,actInfo,...
ObservationInputNames='stateInLyr', ...
ActionInputNames='actionInLyr');
critic = [critic1 critic2];
actorOpts = rlOptimizerOptions( ...
LearnRate=1e-3, ...
GradientThreshold=1);
criticOpts = rlOptimizerOptions( ...
LearnRate=1e-3, ...
GradientThreshold=1);
agentOpts = rlTD3AgentOptions(...
SampleTime=Ts,...
MiniBatchSize=128, ...
ExperienceBufferLength=1e6,...
ActorOptimizerOptions=actorOpts,...
CriticOptimizerOptions=criticOpts);
agentOpts.TargetPolicySmoothModel.StandardDeviation = sqrt(0.1);
agent = rlTD3Agent(actor,critic,agentOpts);
maxepisodes = 1000;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
MaxEpisodes=maxepisodes, ...
MaxStepsPerEpisode=maxsteps, ...
ScoreAveragingWindowLength=100, ...
Verbose=false, ...
Plots="training-progress",...
StopTrainingCriteria="AverageReward",...
StopTrainingValue=-250);
doTraining = true;
if doTraining
% plot(env);
% Train the agent.
trainingStats = train(agent,env,trainOpts);
else
% Load pretrained agent for the example.
load("WaterTankPIDtd3.mat","agent")
end
simOpts = rlSimulationOptions(MaxSteps=maxsteps);
experiences = sim(env,agent,simOpts);
actor = getActor(agent);
parameters = getLearnableParameters(actor);
Ki = abs(parameters{1}(1))
Kp = abs(parameters{1}(2))
mdlTest = 'watertankLQG';
open_system(mdlTest);
set_param([mdlTest '/PID Controller'],'P',num2str(Kp))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki))
sim(mdlTest)
rlStep = simout;
rlCost = cost;
rlStabilityMargin = localStabilityAnalysis(mdlTest);
set_param([mdlTest '/PID Controller'],'P',num2str(Kp_CST))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki_CST))
sim(mdlTest)
cstStep = simout;
cstCost = cost;
cstStabilityMargin = localStabilityAnalysis(mdlTest);
figure
plot(cstStep)
hold on
plot(rlStep)
grid on
legend('Control System Tuner','RL',Location="southeast")
title('Step Response')
Any suggestions on how to approach this, or references to similar examples, would be greatly appreciated. Also, if anyone has any resources or insights on implementing this in Simulink, I would be thankful for your help!
Upvotes: 0
Views: 45