Vojtěch
Vojtěch

Reputation: 12426

Google Cloud not managing users/SSH in VMs

We have upgraded Debian distribution in Google Cloud instance and it seems GCloud cannot manage the users and their SSH keys in the instance anymore.

I have installed following tools:

  1. I cannot connect through the UI. It gets stuck on "Transfering SSH keys to the instance". The "troubleshooting" says that everything is fine.
  2. When trying to connect via gcloud compute ssh it dies with

    permission denied (publickey)

  3. I still have access to the instance with some other user, but no new users are created and no SSH keys transferred.

What else am I missing?

EDIT:

Have you added the SSH key to Project metadata or Instance metadata? If its instance metadata, is project level ssh key blocked?

I haven't added any metadata.

Does your user account has necessary permission in the project to SSH to the instance (e.g Owner, Editor or Compute Instance Admin IAM role)?

Yes this worked correctly until the debian upgrade to bookworm. I could see all the google-cloud related packages were remove and I had to install them.

Are you able to SSH to the instance using ssh client e.g Putty?If yes, you need to make sure Google account manager daemon is running on the instance.

I can nicely SSH with accounts which were active on the machine BEFORE the Debian upgrade. These account already have .ssh directory correctly set up and working. New google users cannot login.

Try gcloud beta compute ssh --zone ZONE INSTANCE_NAME --project PROJECT

This works only for users active before the Debian upgrade.

 If yes, you need to make sure Google account manager daemon is running on the instance.

I installed the google-compute-engine-oslogin package which was missing, but it seems it has no effect and new users still cannot login.

EDIT2:

When connecting to serial console, it gets stuck on: csearch-dev google_guest_agent[2839775]: ERROR non_windows_accounts.go:158 Error updating SSH keys for gke-495d6b605cf336a7b160: mkdir /home/gke-495d6b605cf336a7b160/.ssh: no such file or directory. - the same issue, SSH keys are never transferred into the instance.

Upvotes: 3

Views: 1072

Answers (1)

Veera Nagireddy
Veera Nagireddy

Reputation: 1904

There are a few things you can do troubleshoot the Permission denied (publickey) error message :

To start, you must ensure that you have properly authenticated yourself with gcloud using an IAM user with the compute instance admin role. You can do that by running gcloud auth login [USER] then try gcloud compute ssh again.

You can also verify that the Linux Guest Environment scripts are properly installed and running. Please refer to this page for information about validating, updating, or manually installing the guest environment.

Another possibility is that the private key was lost or that we have a mismatched keypair. To force gcloud to generate a new SSH keypair, you must first move ~/.ssh/google_compute_engine and ~/.ssh/google_compute_engine.pub if present, for example:

mv ~/.ssh/google_compute_engine.pub ~/.ssh/google_compute_engine.pub.old
mv ~/.ssh/google_compute_engine ~/.ssh/google_compute_engine.old

Once that is done, you may then try gcloud compute ssh [INSTANCE-NAME] again, a new keypair should be created and a public key will be added to the SSH keys metadata.

Refer to Sunny-j and Answer to review the serial-port logs of the affected instance for possible clues on the issue. Also refer to Resolving getting locked out of a Compute Engine for more information.

Edit1: Refer to this similar SO and Troubleshooting using the serial console which helps to resolve your error.

EDIT2: Maybe you have git-all installed. Cloud-init and virtually every step of the booting process are disrupted as a result of this, as the older SysV init system takes its place. You are unable to SSH into your instance as a result of this.

Check out these potential solutions to the above problem:

1.Try using git instead of git-all.

2.If git-all is necessary, use apt install --no-install-recommends -y git-all to prevent the installation of recommendations.

Finally : If you were previously able to SSH into the instance with a particular SSH key for new users, either the SSH daemon was not running or was otherwise broken, or you somehow removed that SSH key. It would appear that you damaged this machine during the upgrade.

Why is this particular VM instance required? Does it contain significant data? If this is the case, you can turn it off, mount its disk with a new VM instance, and copy that data off.( I'd recommend build another machine running these services from latest snapshot or scratch and start using that instead).

You should probably move to a new machine if it runs a service: There is no way to tell what still works and what doesn't, even if you are able to access the instance.

Upvotes: 1

Related Questions