Reputation: 1400
In Google Cloud Platform (GCP) after disk is finished mounting I cannot ssh to my instance.
details:
Via google cloud sdk shell cmd windows on a windows 7 PC I run the following at C:\>
:
python "C:\Users\user's name\path\to\py\create_instance_working.py" --name inst-test2 --zone us-central1-a direct-to
pic-1234 cc-test1
which runs create_instance_working.py
, which looks like this:
import argparse
import os
import time
import googleapiclient.discovery
from six.moves import input
# [START list_instances]
def list_instances(compute, project, zone):
result = compute.instances().list(project=project, zone=zone).execute()
return result['items']
# [END list_instances]
# [START create_instance]
def create_instance(compute, project, zone, name, bucket):
image_response = compute.images().getFromFamily(
project='direct-topic-1234', family='theFam').execute()
source_disk_image = image_response['selfLink']
machine_type = "projects/direct-topic-1234/zones/us-central1-a/machineTypes/n1-standard-4"
startup_script = open(
os.path.join(
os.path.dirname(__file__), 'startup-script_working.sh'), 'r').read()
print(machine_type)
config = {
'name': name,
'machineType': machine_type,
'disks': [
{
'boot': True,
'autoDelete': True,
'initializeParams': {
'sourceImage': source_disk_image,
'diskSizeGb': '15',
}
}, {
"deviceName": "disk-2",
"index": 1,
"interface": "SCSI",
"kind": "compute#attachedDisk",
"mode": "READ_WRITE",
"source": "projects/direct-topic-1234/zones/us-central1-a/disks/disk-2",
"type": "PERSISTENT"
}
],
'networkInterfaces': [{
'network': 'global/networks/default',
'accessConfigs': [
{'type': 'ONE_TO_ONE_NAT', 'name': 'External NAT'}
]
}],
"serviceAccounts": [
{
"email": "[email protected]",
"scopes": [
"https://www.googleapis.com/auth/devstorage.read_only",
"https://www.googleapis.com/auth/logging.write",
"https://www.googleapis.com/auth/monitoring.write",
"https://www.googleapis.com/auth/servicecontrol",
"https://www.googleapis.com/auth/service.management.readonly",
"https://www.googleapis.com/auth/trace.append"
]
}
],
'metadata': {
'items': [{
'key': 'startup-script',
'value': startup_script
}, {
'key': 'bucket',
'value': bucket
}]
}
}
return compute.instances().insert(
project=project,
zone=zone,
body=config).execute()
# [END create_instance]
# [START delete_instance]
def delete_instance(compute, project, zone, name):
return compute.instances().delete(
project=project,
zone=zone,
instance=name).execute()
# [END delete_instance]
# [START wait_for_operation]
def wait_for_operation(compute, project, zone, operation):
print('Waiting for operation to finish...')
while True:
result = compute.zoneOperations().get(
project=project,
zone=zone,
operation=operation).execute()
if result['status'] == 'DONE':
print("done.")
if 'error' in result:
raise Exception(result['error'])
return result
time.sleep(1)
# [END wait_for_operation]
# [START run]
def main(project, bucket, zone, instance_name, wait=True):
compute = googleapiclient.discovery.build('compute', 'v1')
print('Creating instance.')
operation = create_instance(compute, project, zone, instance_name, bucket)
wait_for_operation(compute, project, zone, operation['name'])
instances = list_instances(compute, project, zone)
print('Instances in project %s and zone %s:' % (project, zone))
for instance in instances:
print(' - ' + instance['name'])
print("""
Instance created.
It will take a minute or two for the instance to complete work.
Check this URL: http://storage.googleapis.com/{}/output.png
Once the image is uploaded press enter to delete the instance.
""".format(bucket))
# if wait:
# input()
#
# print('Deleting instance.')
#
# operation = delete_instance(compute, project, zone, instance_name)
# wait_for_operation(compute, project, zone, operation['name'])
print('all done with instance.')
if __name__ == '__main__':
print('in here 3')
main('direct-topic-1234', 'cc-test1', 'us-central1-a', 'inst-test1')
print('in here 4')
# [END run]
which calls a startup script (startup-script_working.sh
) that looks like this:
sudo mkfs.ext4 -m 0 -F -E lazy_itable_init=0,lazy_journal_init=0,discard /dev/sdb
sudo mount -o discard,defaults /dev/sdb /var
sudo chmod a+w /var
sudo cp /etc/fstab /etc/fstab.backup
echo UUID=`sudo blkid -s UUID -o value /dev/sdb` /var ext4 discard,defaults,nofail 0 2 | sudo tee -a /etc/fstab
Both of which were adapted from:
In the GCP console, when I see the instance green light I immediately click the instance ssh
button and successfully connect to the instance. However, if I keep opening new ssh connections to the instance, they all work until my mount on /var
completes. I seeing the mount is complete when it appears by looking for it in the ssh connection that works (the first one usually) via: df -h
. I can see the /dev/sdb 197G 60M 197G 1% /var
. Prior to the failed attempt that mount doesn't show up. But after it shows up, nothing will ssh to it. Tried >_
button (shell) in the console doing gcloud compute ssh [instance name]
. Tried putty with [user name]@[external IP]
.
I've tried just waiting 5 minutes to ssh to the instance (the mount would be finished by then), which doesn't work either.
IMPORTANT: If I comment out all of the startup script lines I can connect indefinitely, no SSH problems. I've tried creating a new disk and attaching that instead.
So it seems like it is something about the mount of the disk that is causing the ssh issues.
The early ssh connections continue to operate just fine, even though I can't create a new one.
When the ssh connection fails I get this in the ssh window: "Connection Failed
An error occurred while communicating with the SSH server. Check the server and the network configuration."
Any ideas what would be causing this?
Instance is linux distribution SUSE 12
My mounting instructions come from here:
https://cloud.google.com/compute/docs/disks/add-persistent-disk
If there is a good way to just avoid the situation that be helpful (please provide), but I'd really like to to know what I am doing wrong.
I am new to GCP, the cloud generally, python, ssh, and Linux. (So new to everything in this question!)
If I comment out the startup script lines, run everything as described, ssh to the insurance, run the startup script commands manually, I get no errors, up o need to still test whether I the create another ssh connection afterwards. We do that and report back.
Upvotes: 2
Views: 626
Reputation: 1400
Mounting on /var
, which contains data associated with ssh
(among other things) makes it so systems that reach for /var
see a blank disk. The data must be preserved (cp -ar
) elsewhere, do the mount to var, then move the data back.
My previous answer was wrong.
Upvotes: 2