mountainclimber11
mountainclimber11

Reputation: 1400

gcp: after disk is *finished* mounting I cannot ssh to my instance

In Google Cloud Platform (GCP) after disk is finished mounting I cannot ssh to my instance.

details:

Via google cloud sdk shell cmd windows on a windows 7 PC I run the following at C:\>:

python "C:\Users\user's name\path\to\py\create_instance_working.py" --name inst-test2 --zone us-central1-a direct-to
pic-1234 cc-test1

which runs create_instance_working.py, which looks like this:

import argparse
import os
import time

import googleapiclient.discovery
from six.moves import input


# [START list_instances]
def list_instances(compute, project, zone):
    result = compute.instances().list(project=project, zone=zone).execute()
    return result['items']
# [END list_instances]


# [START create_instance]
def create_instance(compute, project, zone, name, bucket):

    image_response = compute.images().getFromFamily(
        project='direct-topic-1234', family='theFam').execute()
    source_disk_image = image_response['selfLink']

    machine_type = "projects/direct-topic-1234/zones/us-central1-a/machineTypes/n1-standard-4"
    startup_script = open(
        os.path.join(
            os.path.dirname(__file__), 'startup-script_working.sh'), 'r').read()


    print(machine_type) 

    config = {
        'name': name,
        'machineType': machine_type,


        'disks': [
            {
                'boot': True,
                'autoDelete': True,
                'initializeParams': {
                    'sourceImage': source_disk_image,
                    'diskSizeGb': '15',
                }
            }, {

              "deviceName": "disk-2",
              "index": 1,
              "interface": "SCSI",
              "kind": "compute#attachedDisk",
              "mode": "READ_WRITE",
              "source": "projects/direct-topic-1234/zones/us-central1-a/disks/disk-2",
              "type": "PERSISTENT"
            }
        ],

        'networkInterfaces': [{
            'network': 'global/networks/default',
            'accessConfigs': [
                {'type': 'ONE_TO_ONE_NAT', 'name': 'External NAT'}
            ]
        }],


        "serviceAccounts": [
            {
              "email": "[email protected]",
              "scopes": [
                "https://www.googleapis.com/auth/devstorage.read_only",
                "https://www.googleapis.com/auth/logging.write",
                "https://www.googleapis.com/auth/monitoring.write",
                "https://www.googleapis.com/auth/servicecontrol",
                "https://www.googleapis.com/auth/service.management.readonly",
                "https://www.googleapis.com/auth/trace.append"
              ]
            }
          ],



        'metadata': {
            'items': [{
                'key': 'startup-script',
                'value': startup_script
            }, {
                'key': 'bucket',
                'value': bucket
            }]
        }
    }

    return compute.instances().insert(
        project=project,
        zone=zone,
        body=config).execute()
# [END create_instance]


# [START delete_instance]
def delete_instance(compute, project, zone, name):
    return compute.instances().delete(
        project=project,
        zone=zone,
        instance=name).execute()
# [END delete_instance]


# [START wait_for_operation]
def wait_for_operation(compute, project, zone, operation):
    print('Waiting for operation to finish...')
    while True:
        result = compute.zoneOperations().get(
            project=project,
            zone=zone,
            operation=operation).execute()

        if result['status'] == 'DONE':
            print("done.")
            if 'error' in result:
                raise Exception(result['error'])
            return result

        time.sleep(1)
# [END wait_for_operation]


# [START run]
def main(project, bucket, zone, instance_name, wait=True):
    compute = googleapiclient.discovery.build('compute', 'v1')

    print('Creating instance.')

    operation = create_instance(compute, project, zone, instance_name, bucket)
    wait_for_operation(compute, project, zone, operation['name'])

    instances = list_instances(compute, project, zone)

    print('Instances in project %s and zone %s:' % (project, zone))
    for instance in instances:
        print(' - ' + instance['name'])

    print("""
Instance created.
It will take a minute or two for the instance to complete work.
Check this URL: http://storage.googleapis.com/{}/output.png
Once the image is uploaded press enter to delete the instance.
""".format(bucket))

#     if wait:
#         input()
# 
#     print('Deleting instance.')
# 
#     operation = delete_instance(compute, project, zone, instance_name)
#     wait_for_operation(compute, project, zone, operation['name'])

    print('all done with instance.')

if __name__ == '__main__':

    print('in here 3')
    main('direct-topic-1234', 'cc-test1', 'us-central1-a', 'inst-test1')
    print('in here 4')
# [END run]

which calls a startup script (startup-script_working.sh) that looks like this:

sudo mkfs.ext4 -m 0 -F -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
sudo mount -o discard,defaults /dev/sdb /var
sudo chmod a+w /var 
sudo cp /etc/fstab /etc/fstab.backup
echo UUID=`sudo blkid -s UUID -o value /dev/sdb` /var ext4 discard,defaults,nofail 0 2 | sudo tee -a /etc/fstab

Both of which were adapted from:

https://github.com/GoogleCloudPlatform/python-docs-samples/blob/master/compute/api/create_instance.py

In the GCP console, when I see the instance green light I immediately click the instance ssh button and successfully connect to the instance. However, if I keep opening new ssh connections to the instance, they all work until my mount on /var completes. I seeing the mount is complete when it appears by looking for it in the ssh connection that works (the first one usually) via: df -h. I can see the /dev/sdb 197G 60M 197G 1% /var. Prior to the failed attempt that mount doesn't show up. But after it shows up, nothing will ssh to it. Tried >_ button (shell) in the console doing gcloud compute ssh [instance name]. Tried putty with [user name]@[external IP].

I've tried just waiting 5 minutes to ssh to the instance (the mount would be finished by then), which doesn't work either.

IMPORTANT: If I comment out all of the startup script lines I can connect indefinitely, no SSH problems. I've tried creating a new disk and attaching that instead.

So it seems like it is something about the mount of the disk that is causing the ssh issues.

The early ssh connections continue to operate just fine, even though I can't create a new one.

When the ssh connection fails I get this in the ssh window: "Connection Failed

An error occurred while communicating with the SSH server. Check the server and the network configuration."

Any ideas what would be causing this?

Instance is linux distribution SUSE 12

My mounting instructions come from here:

https://cloud.google.com/compute/docs/disks/add-persistent-disk

If there is a good way to just avoid the situation that be helpful (please provide), but I'd really like to to know what I am doing wrong.

I am new to GCP, the cloud generally, python, ssh, and Linux. (So new to everything in this question!)

If I comment out the startup script lines, run everything as described, ssh to the insurance, run the startup script commands manually, I get no errors, up o need to still test whether I the create another ssh connection afterwards. We do that and report back.

Upvotes: 2

Views: 626

Answers (1)

mountainclimber11
mountainclimber11

Reputation: 1400

Mounting on /var, which contains data associated with ssh (among other things) makes it so systems that reach for /var see a blank disk. The data must be preserved (cp -ar) elsewhere, do the mount to var, then move the data back.

My previous answer was wrong.

Upvotes: 2

Related Questions