How can I resolve this error with LM agent?

Question

I'm creating this video game that has artificial intelligence as enemies. I had just started the training session and everything started fine, about 10 seconds later it gave me this error:

C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers	orch_entities\utils.py:289: UserWarning: The use of `x.T` on tensors of dimension other than 2 to reverse their shape is deprecated and it will throw an error in a future release. Consider `x.mT` to transpose batches of matrices or `x.permute(*torch.arange(x.ndim - 1, -1, -1))` to reverse the dimensions of a tensor. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen
ative\TensorShape.cpp:3641.)
  torch.nn.functional.one_hot(_act.T, action_size[i]).float()
C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages	orch\onnx\symbolic_opset9.py:4662: UserWarning: Exporting a model to ONNX with a batch_size other than 1, with a variable length with LSTM can cause an error when running the ONNX model with a different batch size. Make sure to save the model with a batch size of 1, or define the initial states (h0/c0) as inputs of the model.
  warnings.warn(
[INFO] Exported results\=Jolleen2\Jolleen\Jolleen-1152.onnx
[INFO] Copied results\=Jolleen2\Jolleen\Jolleen-1152.onnx to results\=Jolleen2\Jolleen.onnx.
[INFO] Exported results\=Jolleen2\TrainingPlayer\TrainingPlayer-192.onnx
[INFO] Copied results\=Jolleen2\TrainingPlayer\TrainingPlayer-192.onnx to results\=Jolleen2\TrainingPlayer.onnx.
Traceback (most recent call last):
  File "C:\Users\Utente\AppData\Local\Programs\Python\Python39\lib
unpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Utente\AppData\Local\Programs\Python\Python39\lib
unpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Utente\Fight For Life\MLvenv\Scripts\mlagents-learn.exe\__main__.py", line 7, in 
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers\learn.py", line 264, in main
    run_cli(parse_command_line())
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers\learn.py", line 260, in run_cli
    run_training(run_seed, options, num_areas)
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers\learn.py", line 136, in run_training
    tc.start_learning(env_manager)
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents_envs	imers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers	rainer_controller.py", line 175, in start_learning
    n_steps = self.advance(env_manager)
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents_envs	imers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers	rainer_controller.py", line 250, in advance
    trainer.advance()
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers\ghost	rainer.py", line 254, in advance
    self.trainer.advance()
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers	rainer
l_trainer.py", line 302, in advance
    if self._update_policy():
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents_envs	imers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers	rainer\off_policy_trainer.py", line 211, in _update_policy
    update_stats = self.optimizer.update(sampled_minibatch, n_sequences)
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents_envs	imers.py", line 305, in wrapped
    return func(*args, **kwargs)
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers\sac\optimizer_torch.py", line 573, in update
    q1_stream = self._condense_q_streams(q1_out, disc_actions)
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers\sac\optimizer_torch.py", line 467, in _condense_q_streams
    branched_q = ModelUtils.break_into_branches(
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers	orch_entities\utils.py", line 270, in break_into_branches
    branched_logits = [
  File "C:\Users\Utente\Fight For Life\MLvenv\lib\site-packages\mlagents	rainers	orch_entities\utils.py", line 271, in 
    concatenated_logits[:, action_idx[i] : action_idx[i + 1]]
IndexError: too many indices for tensor of dimension 1

This is my opponent's code:

public class Jolleen : Agent
{
    [SerializeField] private float moveSpeed = 4f;
    [SerializeField] private float sprintSpeed = 10f;
    [SerializeField] private float SpeedChangeRate = 10.0f;
    [SerializeField] private GameObject ray;
    [SerializeField] private float GroundedOffset = -0.14f;
    [SerializeField] private float GroundedRadius = 0.25f;
    [SerializeField] private LayerMask GroundLayers;
    [SerializeField] private float Gravity = -15.0f;

    private CharacterController controller;
    private Animator _animator;
    private EnemyLife myLife;
    private float speed;
    private float _animationBlend;
    private float verticalVelocity;
    private bool attack;
    private bool Grounded = true;
    private float terminalVelocity = 53.0f;

    public override void Initialize()
    {
        controller = GetComponent();
        _animator = GetComponent();
        myLife = GetComponent();
    }

    public override void CollectObservations(VectorSensor sensor)
    {
        sensor.AddObservation(transform.localPosition);
        sensor.AddObservation(myLife.GetLife());
        sensor.AddObservation(ray.transform.rotation.x);
        sensor.AddObservation(attack);
    }

    public override void OnActionReceived(ActionBuffers actions)
    {
        float moveRotate = actions.ContinuousActions[0];
        float moveForward = actions.ContinuousActions[1];
        float isSprint = actions.ContinuousActions[2];
        if (actions.ContinuousActions[3] == 1)  attack = true;
        Debug.Log(attack);
        float upRay = actions.ContinuousActions[4];
        float downRay = actions.ContinuousActions[5];
        SetRayRotation(upRay, downRay);

        if (!attack)
        {
            Move(moveForward, isSprint);
        }

        transform.Rotate(0f, moveRotate * moveSpeed, 0f, Space.Self);
    }

    public override void Heuristic(in ActionBuffers actionsOut)
    {
        ActionSegment continuosActions = actionsOut.ContinuousActions;
        continuosActions[0] = Input.GetAxisRaw("Horizontal");
        continuosActions[1] = Input.GetAxisRaw("Vertical");
        continuosActions[2] = Input.GetKey(KeyCode.LeftShift) ? 1 : 0;
        continuosActions[3] = Input.GetKey(KeyCode.Mouse0) ? 1 : 0;
        continuosActions[4] = Input.GetKey(KeyCode.Alpha9) ? 1 : 0;
        continuosActions[5] = Input.GetKey(KeyCode.Alpha0) ? 1 : 0;
    }

    private void Move(float moveForward, float isSprint)
    {
        // set target speed based on move speed, sprint speed and if sprint is pressed
        float targetSpeed = moveSpeed;

        if (isSprint == 1) targetSpeed = sprintSpeed;

        if (moveForward == 0) targetSpeed = 0;

        // a simplistic acceleration and deceleration designed to be easy to remove, replace, or iterate upon

        // a reference to the players current horizontal velocity
        float currentHorizontalSpeed = new Vector3(controller.velocity.x, 0.0f, controller.velocity.z).magnitude;

        float speedOffset = 0.1f;
        float inputMagnitude = 1f;

        // accelerate or decelerate to target speed
        if (currentHorizontalSpeed < targetSpeed - speedOffset ||
            currentHorizontalSpeed > targetSpeed + speedOffset)
        {
            // creates curved result rather than a linear one giving a more organic speed change
            // note T in Lerp is clamped, so we don't need to clamp our speed
            speed = Mathf.Lerp(currentHorizontalSpeed, targetSpeed * inputMagnitude,
                Time.deltaTime * SpeedChangeRate);

            // round speed to 3 decimal places
            speed = Mathf.Round(speed * 1000f) / 1000f;
        }
        else
        {
            speed = targetSpeed;
        }

        _animationBlend = Mathf.Lerp(_animationBlend, targetSpeed, Time.deltaTime * SpeedChangeRate);
        if (_animationBlend < 0.01f) _animationBlend = 0f;

        // normalise input direction
        Vector3 inputDirection = transform.forward;

        // note: Vector2's != operator uses approximation so is not floating point error prone, and is cheaper than magnitude
        // move the player
        controller.Move(inputDirection.normalized * (speed * Time.deltaTime) +
                         new Vector3(0.0f, verticalVelocity, 0.0f) * Time.deltaTime);

        _animator.SetFloat("Speed", _animationBlend);
        _animator.SetFloat("MotionSpeed", inputMagnitude);
    }

    private void SetRayRotation(float upRay, float downRay)
    {
        Vector3 rotationAxis = Vector3.right;
        float rotationAmount = 0f;

        if (upRay == 1 && ray.transform.rotation.x < 0.57f)
        {
            rotationAmount = 30f;
        }
        else if (downRay == 1 && ray.transform.rotation.x > -0.57f)
        {
            rotationAmount = -30f;
        }

        ray.transform.Rotate(rotationAxis, rotationAmount * Time.deltaTime);
    }

    private void GroundedCheck()
    {
        // set sphere position, with offset
        Vector3 spherePosition = new Vector3(transform.position.x, transform.position.y - GroundedOffset,
            transform.position.z);
        Grounded = Physics.CheckSphere(spherePosition, GroundedRadius, GroundLayers,
            QueryTriggerInteraction.Ignore);
    }

    private void ApplyGravity()
    {
        if (verticalVelocity < terminalVelocity)
        {
            verticalVelocity += Gravity * Time.deltaTime;
        }
    }


    private void Update()
    {
        if(myLife.GetLife() <= 0)
        {
            AddReward(-50f);
        }
        GroundedCheck();
        ApplyGravity();
    }

    public bool GetAttack()
    {
        return attack;
    }

    public void SetAttack(bool attack)
    {
        this.attack = attack;
    }
}

This is the player code, because even if it will be played by a person, I decided to train the artificial intelligence with a player who is also one:

public class TraningPlayer : Agent
{
    [SerializeField] private float moveSpeed = 4f;
    [SerializeField] private float sprintSpeed = 10f;
    [SerializeField] private float SpeedChangeRate = 10.0f;
    [SerializeField] private GameObject ray;
    [SerializeField] private float GroundedOffset = -0.14f;
    [SerializeField] private float GroundedRadius = 0.25f;
    [SerializeField] private LayerMask GroundLayers;
    [SerializeField] private float Gravity = -15.0f;
    [SerializeField] private LayerMask detectionLayer;
    [SerializeField] private float detectionRadius = 10f;
    [SerializeField] private float detectionAngle = 60f;

    private CharacterController controller;
    private PlayerLife myLife;
    private float speed;
    private float verticalVelocity;
    private bool attack;
    private bool Grounded = true;
    private float terminalVelocity = 53.0f;
    private float timerBeforAttack = 0.65f;
    private float timer = 1.5f;

    public override void Initialize()
    {
        controller = GetComponent();
        myLife = GetComponent();
    }

    public override void CollectObservations(VectorSensor sensor)
    {
        sensor.AddObservation(transform.localPosition);
        sensor.AddObservation(myLife.GetLife());
        sensor.AddObservation(ray.transform.rotation.x);
        sensor.AddObservation(attack);
    }

    public override void OnActionReceived(ActionBuffers actions)
    {
        float moveRotate = actions.ContinuousActions[0];
        float moveForward = actions.ContinuousActions[1];
        float isSprint = actions.ContinuousActions[2];
        if (actions.ContinuousActions[3] == 1) attack = true;
        Debug.Log(attack);
        float upRay = actions.ContinuousActions[4];
        float downRay = actions.ContinuousActions[5];
        SetRayRotation(upRay, downRay);

        if (!attack)
        {
            Move(moveForward, isSprint);
        }

        transform.Rotate(0f, moveRotate * moveSpeed, 0f, Space.Self);
    }

    public override void Heuristic(in ActionBuffers actionsOut)
    {
        ActionSegment continuosActions = actionsOut.ContinuousActions;
        continuosActions[0] = Input.GetAxisRaw("Horizontal");
        continuosActions[1] = Input.GetAxisRaw("Vertical");
        continuosActions[2] = Input.GetKey(KeyCode.LeftShift) ? 1 : 0;
        continuosActions[3] = Input.GetKey(KeyCode.Mouse0) ? 1 : 0;
        continuosActions[4] = Input.GetKey(KeyCode.Alpha9) ? 1 : 0;
        continuosActions[5] = Input.GetKey(KeyCode.Alpha0) ? 1 : 0;
    }

    private void Move(float moveForward, float isSprint)
    {
        // set target speed based on move speed, sprint speed and if sprint is pressed
        float targetSpeed = moveSpeed;

        if (isSprint == 1) targetSpeed = sprintSpeed;

        if (moveForward == 0) targetSpeed = 0;

        // a simplistic acceleration and deceleration designed to be easy to remove, replace, or iterate upon

        // a reference to the players current horizontal velocity
        float currentHorizontalSpeed = new Vector3(controller.velocity.x, 0.0f, controller.velocity.z).magnitude;

        float speedOffset = 0.1f;
        float inputMagnitude = 1f;

        // accelerate or decelerate to target speed
        if (currentHorizontalSpeed < targetSpeed - speedOffset ||
            currentHorizontalSpeed > targetSpeed + speedOffset)
        {
            // creates curved result rather than a linear one giving a more organic speed change
            // note T in Lerp is clamped, so we don't need to clamp our speed
            speed = Mathf.Lerp(currentHorizontalSpeed, targetSpeed * inputMagnitude,
                Time.deltaTime * SpeedChangeRate);

            // round speed to 3 decimal places
            speed = Mathf.Round(speed * 1000f) / 1000f;
        }
        else
        {
            speed = targetSpeed;
        }

        // normalise input direction
        Vector3 inputDirection = transform.forward;

        // note: Vector2's != operator uses approximation so is not floating point error prone, and is cheaper than magnitude
        // move the player
        controller.Move(inputDirection.normalized * (speed * Time.deltaTime) +
                         new Vector3(0.0f, verticalVelocity, 0.0f) * Time.deltaTime);

    }

    private void SetRayRotation(float upRay, float downRay)
    {
        Vector3 rotationAxis = Vector3.right;
        float rotationAmount = 0f;

        if (upRay == 1 && ray.transform.rotation.x < 0.57f)
        {
            rotationAmount = 30f;
        }
        else if (downRay == 1 && ray.transform.rotation.x > -0.57f)
        {
            rotationAmount = -30f;
        }

        ray.transform.Rotate(rotationAxis, rotationAmount * Time.deltaTime);
    }

    private void GroundedCheck()
    {
        // set sphere position, with offset
        Vector3 spherePosition = new Vector3(transform.position.x, transform.position.y - GroundedOffset,
            transform.position.z);
        Grounded = Physics.CheckSphere(spherePosition, GroundedRadius, GroundLayers,
            QueryTriggerInteraction.Ignore);
    }

    private void ApplyGravity()
    {
        if (verticalVelocity < terminalVelocity)
        {
            verticalVelocity += Gravity * Time.deltaTime;
        }
    }


    private void Update()
    {
        if (myLife.GetLife() <= 0)
        {
            AddReward(-50f);
        }
        GroundedCheck();
        ApplyGravity();
        if (GetAttack())
        {

            timer -= Time.deltaTime;
            timerBeforAttack -= Time.deltaTime;

            if (timerBeforAttack <= 0)
            {
                DetectObjects();
                timerBeforAttack = 100;
            }

            if (timer <= 0)
            {
                SetAttack(false);
                timer = 2.4f;
                timerBeforAttack = 1.2f;
            }

        }
    }

    public bool GetAttack()
    {
        return attack;
    }

    public void SetAttack(bool attack)
    {
        this.attack = attack;
    }

    void DetectObjects()
    {
        Collider[] colliders = Physics.OverlapSphere(transform.position, detectionRadius, detectionLayer);

        foreach (Collider collider in colliders)
        {
            Vector3 directionToObject = collider.transform.position - transform.position;
            float angleToObject = Vector3.Angle(transform.forward, directionToObject);

            if (angleToObject < detectionAngle / 2f)
            {
                if (collider.gameObject.TryGetComponent(out EnemyLife enemyLife))
                {
                    enemyLife.SetColpito(true);
                }
            }
        }
    }
}

And finally this is the yaml code that manages the artificial intelligences:

behaviors:
  Jolleen:
    trainer_type: sac

    # SAC-specific configs (replaces the hyperparameters section above)
    hyperparameters:
      batch_size: 1024
      buffer_size: 10240
      learning_rate: 3.0e-4
      learning_rate_schedule: linear

      buffer_init_steps: 0
      tau: 0.005
      steps_per_update: 10.0
      save_replay_buffer: false
      init_entcoef: 0.5
      reward_signal_steps_per_update: 10.0

    # Configuration of the neural network (common to PPO/SAC)
    network_settings:
      vis_encode_type: simple
      normalize: false
      hidden_units: 128
      num_layers: 2
      # memory
      memory:
        sequence_length: 64
        memory_size: 256

    # Trainer configurations common to all trainers
    max_steps: 5000000000
    time_horizon: 64
    summary_freq: 10000
    keep_checkpoints: 5
    checkpoint_interval: 50000
    threaded: false
    init_path: null

    reward_signals:
      # environment reward (default)
      extrinsic:
        strength: 1.0
        gamma: 0.99

    # self-play
    self_play:
      window: 10
      play_against_latest_model_ratio: 0.5
      save_steps: 50000
      swap_steps: 2000
      team_change: 100000
  
  TrainingPlayer:
    trainer_type: sac

    # SAC-specific configs (replaces the hyperparameters section above)
    hyperparameters:
      batch_size: 1024
      buffer_size: 10240
      learning_rate: 3.0e-4
      learning_rate_schedule: linear

      buffer_init_steps: 0
      tau: 0.005
      steps_per_update: 10.0
      save_replay_buffer: false
      init_entcoef: 0.5
      reward_signal_steps_per_update: 10.0

    # Configuration of the neural network (common to PPO/SAC)
    network_settings:
      vis_encode_type: simple
      normalize: false
      hidden_units: 128
      num_layers: 2
      # memory
      memory:
        sequence_length: 64
        memory_size: 256

    # Trainer configurations common to all trainers
    max_steps: 5000000000
    time_horizon: 64
    summary_freq: 10000
    keep_checkpoints: 5
    checkpoint_interval: 50000
    threaded: false
    init_path: null

    reward_signals:
      # environment reward (default)
      extrinsic:
        strength: 1.0
        gamma: 0.99

I honestly don't know where to start, because it's the first time I've tackled such a big project with AI, until now I had always done small projects.

How can I resolve this error with LM agent?

Answers (0)

Related Questions