Reputation: 1
I'm creating this video game that has artificial intelligence as enemies. I had just started the training session and everything started fine, about 10 seconds later it gave me this error:
C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\torch_entities\utils.py:289: UserWarning: The use of `x.T` on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider `x.mT` to transpose batches of matrices or `x.permute(*torch.arange(x.ndim - 1, -1, -1))` to reverse the dimensions of a tensor. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\TensorShape.cpp:3641.)
torch.nn.functional.one_hot(_act.T, action_size[i]).float()
C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\torch\onnx\symbolic_opset9.py:4662: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model.
warnings.warn(
[INFO] Exported results\=Jolleen2\Jolleen\Jolleen-1152.onnx
[INFO] Copied results\=Jolleen2\Jolleen\Jolleen-1152.onnx to results\=Jolleen2\Jolleen.onnx.
[INFO] Exported results\=Jolleen2\TrainingPlayer\TrainingPlayer-192.onnx
[INFO] Copied results\=Jolleen2\TrainingPlayer\TrainingPlayer-192.onnx to results\=Jolleen2\TrainingPlayer.onnx.
Traceback (most recent call last):
File "C:\Users\Utente\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\Utente\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\Utente\Fight For Life\MLvenv\Scripts\mlagents-learn.exe\__main__.py", line 7, in <module>
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\learn.py", line 264, in main
run_cli(parse_command_line())
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\learn.py", line 260, in run_cli
run_training(run_seed, options, num_areas)
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\learn.py", line 136, in run_training
tc.start_learning(env_manager)
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\trainer_controller.py", line 175, in start_learning
n_steps = self.advance(env_manager)
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\trainer_controller.py", line 250, in advance
trainer.advance()
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\ghost\trainer.py", line 254, in advance
self.trainer.advance()
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\trainer\rl_trainer.py", line 302, in advance
if self._update_policy():
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\trainer\off_policy_trainer.py", line 211, in _update_policy
update_stats = self.optimizer.update(sampled_minibatch, n_sequences)
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents_envs\timers.py", line 305, in wrapped
return func(*args, **kwargs)
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\sac\optimizer_torch.py", line 573, in update
q1_stream = self._condense_q_streams(q1_out, disc_actions)
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\sac\optimizer_torch.py", line 467, in _condense_q_streams
branched_q = ModelUtils.break_into_branches(
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\torch_entities\utils.py", line 270, in break_into_branches
branched_logits = [
File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents\trainers\torch_entities\utils.py", line 271, in <listcomp>
concatenated_logits[:, action_idx[i] : action_idx[i + 1]]
IndexError: too many indices for tensor of dimension 1
This is my opponent's code:
public class Jolleen : Agent
{
[SerializeField] private float moveSpeed = 4f;
[SerializeField] private float sprintSpeed = 10f;
[SerializeField] private float SpeedChangeRate = 10.0f;
[SerializeField] private GameObject ray;
[SerializeField] private float GroundedOffset = -0.14f;
[SerializeField] private float GroundedRadius = 0.25f;
[SerializeField] private LayerMask GroundLayers;
[SerializeField] private float Gravity = -15.0f;
private CharacterController controller;
private Animator _animator;
private EnemyLife myLife;
private float speed;
private float _animationBlend;
private float verticalVelocity;
private bool attack;
private bool Grounded = true;
private float terminalVelocity = 53.0f;
public override void Initialize()
{
controller = GetComponent<CharacterController>();
_animator = GetComponent<Animator>();
myLife = GetComponent<EnemyLife>();
}
public override void CollectObservations(VectorSensor sensor)
{
sensor.AddObservation(transform.localPosition);
sensor.AddObservation(myLife.GetLife());
sensor.AddObservation(ray.transform.rotation.x);
sensor.AddObservation(attack);
}
public override void OnActionReceived(ActionBuffers actions)
{
float moveRotate = actions.ContinuousActions[0];
float moveForward = actions.ContinuousActions[1];
float isSprint = actions.ContinuousActions[2];
if (actions.ContinuousActions[3] == 1) attack = true;
Debug.Log(attack);
float upRay = actions.ContinuousActions[4];
float downRay = actions.ContinuousActions[5];
SetRayRotation(upRay, downRay);
if (!attack)
{
Move(moveForward, isSprint);
}
transform.Rotate(0f, moveRotate * moveSpeed, 0f, Space.Self);
}
public override void Heuristic(in ActionBuffers actionsOut)
{
ActionSegment<float> continuosActions = actionsOut.ContinuousActions;
continuosActions[0] = Input.GetAxisRaw("Horizontal");
continuosActions[1] = Input.GetAxisRaw("Vertical");
continuosActions[2] = Input.GetKey(KeyCode.LeftShift) ? 1 : 0;
continuosActions[3] = Input.GetKey(KeyCode.Mouse0) ? 1 : 0;
continuosActions[4] = Input.GetKey(KeyCode.Alpha9) ? 1 : 0;
continuosActions[5] = Input.GetKey(KeyCode.Alpha0) ? 1 : 0;
}
private void Move(float moveForward, float isSprint)
{
// set target speed based on move speed, sprint speed and if sprint is pressed
float targetSpeed = moveSpeed;
if (isSprint == 1) targetSpeed = sprintSpeed;
if (moveForward == 0) targetSpeed = 0;
// a simplistic acceleration and deceleration designed to be easy to remove, replace, or iterate upon
// a reference to the players current horizontal velocity
float currentHorizontalSpeed = new Vector3(controller.velocity.x, 0.0f, controller.velocity.z).magnitude;
float speedOffset = 0.1f;
float inputMagnitude = 1f;
// accelerate or decelerate to target speed
if (currentHorizontalSpeed < targetSpeed - speedOffset ||
currentHorizontalSpeed > targetSpeed + speedOffset)
{
// creates curved result rather than a linear one giving a more organic speed change
// note T in Lerp is clamped, so we don't need to clamp our speed
speed = Mathf.Lerp(currentHorizontalSpeed, targetSpeed * inputMagnitude,
Time.deltaTime * SpeedChangeRate);
// round speed to 3 decimal places
speed = Mathf.Round(speed * 1000f) / 1000f;
}
else
{
speed = targetSpeed;
}
_animationBlend = Mathf.Lerp(_animationBlend, targetSpeed, Time.deltaTime * SpeedChangeRate);
if (_animationBlend < 0.01f) _animationBlend = 0f;
// normalise input direction
Vector3 inputDirection = transform.forward;
// note: Vector2's != operator uses approximation so is not floating point error prone, and is cheaper than magnitude
// move the player
controller.Move(inputDirection.normalized * (speed * Time.deltaTime) +
new Vector3(0.0f, verticalVelocity, 0.0f) * Time.deltaTime);
_animator.SetFloat("Speed", _animationBlend);
_animator.SetFloat("MotionSpeed", inputMagnitude);
}
private void SetRayRotation(float upRay, float downRay)
{
Vector3 rotationAxis = Vector3.right;
float rotationAmount = 0f;
if (upRay == 1 && ray.transform.rotation.x < 0.57f)
{
rotationAmount = 30f;
}
else if (downRay == 1 && ray.transform.rotation.x > -0.57f)
{
rotationAmount = -30f;
}
ray.transform.Rotate(rotationAxis, rotationAmount * Time.deltaTime);
}
private void GroundedCheck()
{
// set sphere position, with offset
Vector3 spherePosition = new Vector3(transform.position.x, transform.position.y - GroundedOffset,
transform.position.z);
Grounded = Physics.CheckSphere(spherePosition, GroundedRadius, GroundLayers,
QueryTriggerInteraction.Ignore);
}
private void ApplyGravity()
{
if (verticalVelocity < terminalVelocity)
{
verticalVelocity += Gravity * Time.deltaTime;
}
}
private void Update()
{
if(myLife.GetLife() <= 0)
{
AddReward(-50f);
}
GroundedCheck();
ApplyGravity();
}
public bool GetAttack()
{
return attack;
}
public void SetAttack(bool attack)
{
this.attack = attack;
}
}
This is the player code, because even if it will be played by a person, I decided to train the artificial intelligence with a player who is also one:
public class TraningPlayer : Agent
{
[SerializeField] private float moveSpeed = 4f;
[SerializeField] private float sprintSpeed = 10f;
[SerializeField] private float SpeedChangeRate = 10.0f;
[SerializeField] private GameObject ray;
[SerializeField] private float GroundedOffset = -0.14f;
[SerializeField] private float GroundedRadius = 0.25f;
[SerializeField] private LayerMask GroundLayers;
[SerializeField] private float Gravity = -15.0f;
[SerializeField] private LayerMask detectionLayer;
[SerializeField] private float detectionRadius = 10f;
[SerializeField] private float detectionAngle = 60f;
private CharacterController controller;
private PlayerLife myLife;
private float speed;
private float verticalVelocity;
private bool attack;
private bool Grounded = true;
private float terminalVelocity = 53.0f;
private float timerBeforAttack = 0.65f;
private float timer = 1.5f;
public override void Initialize()
{
controller = GetComponent<CharacterController>();
myLife = GetComponent<PlayerLife>();
}
public override void CollectObservations(VectorSensor sensor)
{
sensor.AddObservation(transform.localPosition);
sensor.AddObservation(myLife.GetLife());
sensor.AddObservation(ray.transform.rotation.x);
sensor.AddObservation(attack);
}
public override void OnActionReceived(ActionBuffers actions)
{
float moveRotate = actions.ContinuousActions[0];
float moveForward = actions.ContinuousActions[1];
float isSprint = actions.ContinuousActions[2];
if (actions.ContinuousActions[3] == 1) attack = true;
Debug.Log(attack);
float upRay = actions.ContinuousActions[4];
float downRay = actions.ContinuousActions[5];
SetRayRotation(upRay, downRay);
if (!attack)
{
Move(moveForward, isSprint);
}
transform.Rotate(0f, moveRotate * moveSpeed, 0f, Space.Self);
}
public override void Heuristic(in ActionBuffers actionsOut)
{
ActionSegment<float> continuosActions = actionsOut.ContinuousActions;
continuosActions[0] = Input.GetAxisRaw("Horizontal");
continuosActions[1] = Input.GetAxisRaw("Vertical");
continuosActions[2] = Input.GetKey(KeyCode.LeftShift) ? 1 : 0;
continuosActions[3] = Input.GetKey(KeyCode.Mouse0) ? 1 : 0;
continuosActions[4] = Input.GetKey(KeyCode.Alpha9) ? 1 : 0;
continuosActions[5] = Input.GetKey(KeyCode.Alpha0) ? 1 : 0;
}
private void Move(float moveForward, float isSprint)
{
// set target speed based on move speed, sprint speed and if sprint is pressed
float targetSpeed = moveSpeed;
if (isSprint == 1) targetSpeed = sprintSpeed;
if (moveForward == 0) targetSpeed = 0;
// a simplistic acceleration and deceleration designed to be easy to remove, replace, or iterate upon
// a reference to the players current horizontal velocity
float currentHorizontalSpeed = new Vector3(controller.velocity.x, 0.0f, controller.velocity.z).magnitude;
float speedOffset = 0.1f;
float inputMagnitude = 1f;
// accelerate or decelerate to target speed
if (currentHorizontalSpeed < targetSpeed - speedOffset ||
currentHorizontalSpeed > targetSpeed + speedOffset)
{
// creates curved result rather than a linear one giving a more organic speed change
// note T in Lerp is clamped, so we don't need to clamp our speed
speed = Mathf.Lerp(currentHorizontalSpeed, targetSpeed * inputMagnitude,
Time.deltaTime * SpeedChangeRate);
// round speed to 3 decimal places
speed = Mathf.Round(speed * 1000f) / 1000f;
}
else
{
speed = targetSpeed;
}
// normalise input direction
Vector3 inputDirection = transform.forward;
// note: Vector2's != operator uses approximation so is not floating point error prone, and is cheaper than magnitude
// move the player
controller.Move(inputDirection.normalized * (speed * Time.deltaTime) +
new Vector3(0.0f, verticalVelocity, 0.0f) * Time.deltaTime);
}
private void SetRayRotation(float upRay, float downRay)
{
Vector3 rotationAxis = Vector3.right;
float rotationAmount = 0f;
if (upRay == 1 && ray.transform.rotation.x < 0.57f)
{
rotationAmount = 30f;
}
else if (downRay == 1 && ray.transform.rotation.x > -0.57f)
{
rotationAmount = -30f;
}
ray.transform.Rotate(rotationAxis, rotationAmount * Time.deltaTime);
}
private void GroundedCheck()
{
// set sphere position, with offset
Vector3 spherePosition = new Vector3(transform.position.x, transform.position.y - GroundedOffset,
transform.position.z);
Grounded = Physics.CheckSphere(spherePosition, GroundedRadius, GroundLayers,
QueryTriggerInteraction.Ignore);
}
private void ApplyGravity()
{
if (verticalVelocity < terminalVelocity)
{
verticalVelocity += Gravity * Time.deltaTime;
}
}
private void Update()
{
if (myLife.GetLife() <= 0)
{
AddReward(-50f);
}
GroundedCheck();
ApplyGravity();
if (GetAttack())
{
timer -= Time.deltaTime;
timerBeforAttack -= Time.deltaTime;
if (timerBeforAttack <= 0)
{
DetectObjects();
timerBeforAttack = 100;
}
if (timer <= 0)
{
SetAttack(false);
timer = 2.4f;
timerBeforAttack = 1.2f;
}
}
}
public bool GetAttack()
{
return attack;
}
public void SetAttack(bool attack)
{
this.attack = attack;
}
void DetectObjects()
{
Collider[] colliders = Physics.OverlapSphere(transform.position, detectionRadius, detectionLayer);
foreach (Collider collider in colliders)
{
Vector3 directionToObject = collider.transform.position - transform.position;
float angleToObject = Vector3.Angle(transform.forward, directionToObject);
if (angleToObject < detectionAngle / 2f)
{
if (collider.gameObject.TryGetComponent<EnemyLife>(out EnemyLife enemyLife))
{
enemyLife.SetColpito(true);
}
}
}
}
}
And finally this is the yaml code that manages the artificial intelligences:
behaviors:
Jolleen:
trainer_type: sac
# SAC-specific configs (replaces the hyperparameters section above)
hyperparameters:
batch_size: 1024
buffer_size: 10240
learning_rate: 3.0e-4
learning_rate_schedule: linear
buffer_init_steps: 0
tau: 0.005
steps_per_update: 10.0
save_replay_buffer: false
init_entcoef: 0.5
reward_signal_steps_per_update: 10.0
# Configuration of the neural network (common to PPO/SAC)
network_settings:
vis_encode_type: simple
normalize: false
hidden_units: 128
num_layers: 2
# memory
memory:
sequence_length: 64
memory_size: 256
# Trainer configurations common to all trainers
max_steps: 5000000000
time_horizon: 64
summary_freq: 10000
keep_checkpoints: 5
checkpoint_interval: 50000
threaded: false
init_path: null
reward_signals:
# environment reward (default)
extrinsic:
strength: 1.0
gamma: 0.99
# self-play
self_play:
window: 10
play_against_latest_model_ratio: 0.5
save_steps: 50000
swap_steps: 2000
team_change: 100000
TrainingPlayer:
trainer_type: sac
# SAC-specific configs (replaces the hyperparameters section above)
hyperparameters:
batch_size: 1024
buffer_size: 10240
learning_rate: 3.0e-4
learning_rate_schedule: linear
buffer_init_steps: 0
tau: 0.005
steps_per_update: 10.0
save_replay_buffer: false
init_entcoef: 0.5
reward_signal_steps_per_update: 10.0
# Configuration of the neural network (common to PPO/SAC)
network_settings:
vis_encode_type: simple
normalize: false
hidden_units: 128
num_layers: 2
# memory
memory:
sequence_length: 64
memory_size: 256
# Trainer configurations common to all trainers
max_steps: 5000000000
time_horizon: 64
summary_freq: 10000
keep_checkpoints: 5
checkpoint_interval: 50000
threaded: false
init_path: null
reward_signals:
# environment reward (default)
extrinsic:
strength: 1.0
gamma: 0.99
I honestly don't know where to start, because it's the first time I've tackled such a big project with AI, until now I had always done small projects.
Upvotes: 0
Views: 97