frank li
frank li

Reputation: 1

How to Use MATLAB's RL Toolbox to Build a Reinforcement Learning Agent for Controlling PID Parameters in Track Control?

I have a system where a car's position is adjusted using PID control, and the car moves along a road with several measurement points (x, y coordinates). I want to improve the precision of the PID controller using a reinforcement learning (RL) agent. I referred to the MATLAB example "Tune PI Controller Using Reinforcement Learning" for guidance. The challenge I’m facing now is how to implement my system in Simulink. The model training should follow a similar approach to the example, but I am unsure how to structure the Simulink model to integrate the RL agent for controlling the PID parameters. Any advice or steps to help with this integration would be much appreciated!

Here is the code and model I refer to and how I should modify it.
Reinforcement learning model of water tank system

open_system('watertankLQG');
Ts = 0.1;
Tf = 10;
controlSystemTuner("ControlSystemTunerSession");
Kp_CST = 9.80199999804512;
Ki_CST = 1.00019996230706e-06;
mdl = 'rlwatertankPIDTune';
open_system(mdl);
[env,obsInfo,actInfo] = localCreatePIDEnv(mdl);
numObs = prod(obsInfo.Dimension);
numAct = prod(actInfo.Dimension);
rng(0);
initialGain = single([1e-3 2]);

actorNet = [
    featureInputLayer(numObs)
    fullyConnectedPILayer(initialGain,'ActOutLyr')
    ];
actorNetGraph = layerGraph(actorNet);
actorNet = dlnetwork(actorNetGraph);


actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);


criticNet = localCreateCriticNetwork(numObs,numAct);
criticNetGraph = layerGraph(criticNet);
critic1 = rlQValueFunction(dlnetwork(criticNetGraph), ...
    obsInfo,actInfo,...
    ObservationInputNames='stateInLyr', ...
    ActionInputNames='actionInLyr');

critic2 = rlQValueFunction(dlnetwork(criticNetGraph), ...
    obsInfo,actInfo,...
    ObservationInputNames='stateInLyr', ...
    ActionInputNames='actionInLyr');
critic = [critic1 critic2];
actorOpts = rlOptimizerOptions( ...
    LearnRate=1e-3, ...
    GradientThreshold=1);

criticOpts = rlOptimizerOptions( ...
    LearnRate=1e-3, ...
    GradientThreshold=1);
agentOpts = rlTD3AgentOptions(...
    SampleTime=Ts,...
    MiniBatchSize=128, ...
    ExperienceBufferLength=1e6,...
    ActorOptimizerOptions=actorOpts,...
    CriticOptimizerOptions=criticOpts);
agentOpts.TargetPolicySmoothModel.StandardDeviation = sqrt(0.1);
agent = rlTD3Agent(actor,critic,agentOpts);
maxepisodes = 1000;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
    MaxEpisodes=maxepisodes, ...
    MaxStepsPerEpisode=maxsteps, ...
    ScoreAveragingWindowLength=100, ...
    Verbose=false, ...
    Plots="training-progress",...
    StopTrainingCriteria="AverageReward",...
    StopTrainingValue=-250);

doTraining = true;

if doTraining
    % plot(env);
    % Train the agent.
    trainingStats = train(agent,env,trainOpts);
else
    % Load pretrained agent for the example.
    load("WaterTankPIDtd3.mat","agent")
end

simOpts = rlSimulationOptions(MaxSteps=maxsteps);
experiences = sim(env,agent,simOpts);
actor = getActor(agent);
parameters = getLearnableParameters(actor);
Ki = abs(parameters{1}(1))
Kp = abs(parameters{1}(2))
mdlTest = 'watertankLQG';
open_system(mdlTest); 
set_param([mdlTest '/PID Controller'],'P',num2str(Kp))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki))
sim(mdlTest)
rlStep = simout;
rlCost = cost;
rlStabilityMargin = localStabilityAnalysis(mdlTest);

set_param([mdlTest '/PID Controller'],'P',num2str(Kp_CST))
set_param([mdlTest '/PID Controller'],'I',num2str(Ki_CST))
sim(mdlTest)
cstStep = simout;
cstCost = cost;
cstStabilityMargin = localStabilityAnalysis(mdlTest);
figure
plot(cstStep)
hold on
plot(rlStep)
grid on
legend('Control System Tuner','RL',Location="southeast")
title('Step Response')

Any suggestions on how to approach this, or references to similar examples, would be greatly appreciated. Also, if anyone has any resources or insights on implementing this in Simulink, I would be thankful for your help!

Upvotes: 0

Views: 45

Answers (0)

Related Questions