Unable to access the VM after first reboot when using Startup Script

I am using the following start up script in CentOS7 VM in GCP. The url is accessible after first reboot. But if I reboot my machine or stop and then start the machine the machine is not accessible and the url also doesn't work. I though this may be due to selinux so I added the code to disable selinux, but still same results. I tried this by creating multiple new VMs but looks like there is something which I am not able to figure out. When I manually execute this script on the VM and try multiple reboot, I didn't faced any issue.

#!/bin/bash -xe

# introducing sleep so network interfaces and routes can get ready before fetching software
sleep 10

if rpm -q --quiet httpd ; then 
    echo "installed"
else
  yum update -y
  yum install -y httpd php php-common
  setenforce 0
  sed -i.bak -e 's/^SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
  
cat > /var/www/html/index.php <<'EOF'
<?php
function metadata_value($value) {
    $opts = array(
        "http" => array(
            "method" => "GET",
            "header" => "Metadata-Flavor: Google"
        )
    );
    $context = stream_context_create($opts);
    $content = file_get_contents("http://metadata/computeMetadata/v1/$value", false, $context);
    return $content;
}
if ($_SERVER['HTTP_X_FORWARDED_PROTO'] == "http") {
        $redirect = 'https://' . $_SERVER['HTTP_HOST'] . $_SERVER['REQUEST_URI'];
        header('HTTP/1.1 301 Moved Permanently');
        header('Location: ' . $redirect);
        exit();
}
?>

<!doctype html>
<html>
<head>
<!-- Compiled and minified CSS -->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/materialize/0.97.0/css/materialize.min.css">

<!-- Compiled and minified JavaScript -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/materialize/0.97.0/js/materialize.min.js"></script>
<title>Frontend Web Server</title>
</head>
<body>
<div class="container">
<div class="row">
<div class="col s2">&nbsp;</div>
<div class="col s8">

<img src="/assets/gcp-logo.svg"/>

<div class="card blue">
<div class="card-content white-text">
<div class="card-title">Backend that serviced this request</div>
</div>
<div class="card-content white">
<table class="bordered">
  <tbody>
    <tr>
      <td>Name</td>
      <td><?php printf(metadata_value("instance/name")) ?></td>
    </tr>
    <tr>
      <td>ID</td>
      <td><?php printf(metadata_value("instance/id")) ?></td>
    </tr>
    <tr>
      <td>Hostname</td>
      <td><?php printf(metadata_value("instance/hostname")) ?></td>
    </tr>
    <tr>
      <td>Zone</td>
      <td><?php printf(metadata_value("instance/zone")) ?></td>
    </tr>
    <tr>
      <td>Machine Type</td>
      <td><?php printf(metadata_value("instance/machine-type")) ?></td>
    </tr>
    <tr>
      <td>Project</td>
      <td><?php printf(metadata_value("project/project-id")) ?></td>
    </tr>
    <tr>
      <td>Internal IP</td>
      <td><?php printf(metadata_value("instance/network-interfaces/0/ip")) ?></td>
    </tr>
    <tr>
      <td>External IP</td>
      <td><?php printf(metadata_value("instance/network-interfaces/0/access-configs/0/external-ip")) ?></td>
    </tr>
  </tbody>
</table>
</div>
</div>

<div class="card blue">
<div class="card-content white-text">
<div class="card-title">Proxy that handled this request</div>
</div>
<div class="card-content white">
<table class="bordered">
  <tbody>
    <tr>
      <td>Address</td>
      <td><?php printf($_SERVER["HTTP_HOST"]); ?></td>
    </tr>
  </tbody>
</table>
</div>

</div>
</div>
<div class="col s2">&nbsp;</div>
</div>
</div>
</html>
EOF

mkdir -p /var/www/html/group1 && cp /var/www/html/index.php /var/www/html/group1/index.php

systemctl enable httpd
systemctl restart httpd

fi

On console I can see the below output

serialport: Connected to mytower.us-central1-a.centos7 port 1 (session ID: 405c4d17b926f0906f45a53784d4abd379d6480d, active connections: 1).
DS   - 0000000000000030, ES  - 0000000000000030, FS  - 0000000000000030
GS   - 0000000000000030, SS  - 0000000000000030
CR0  - 0000000080010033, CR2 - 0000000000000000, CR3 - 00000000BF401000
CR4  - 0000000000000668, CR8 - 0000000000000000
DR0  - 0000000000000000, DR1 - 0000000000000000, DR2 - 0000000000000000
DR3  - 0000000000000000, DR6 - 00000000FFFF0FF0, DR7 - 0000000000000400
GDTR - 00000000BF3EEA98 0000000000000047, LDTR - 0000000000000000
IDTR - 00000000BEE1F018 0000000000000FFF,   TR - 0000000000000000
FXSAVE_STATE - 00000000BFF39AB0
!!!! Find image based on IP(0xBF2E6D5C) /build/work/af60adde42b1d1ad5be2a01e4924bb905248/google3/blaze-out/k8-opt/genfiles/third_party/edk2/ovmf_x64_csm_debug_workspace_dir/ovmf_x64_csm_debug_edk2_files_dir/Build/OvmfX64/DEBUG_CLANG38/X64/OvmfPkg/8254TimerDxe/8254Timer/DEBUG/Timer.dll (ImageBase=00000000BF2E5000, EntryPoint=00000000BF2E6AB5) !!!!

Has anyone facing such issue? Request to help me figure out why the VM is not accessible after reboot.

Thanks

Upvotes: 2

Answers (3)

d.s

Reputation: 189

Solution for RHEL7

RHEL7 instances which was not rebooted

Execute the below command to downgrade the culprits packages and then reboot

yum downgrade shim* grub2* mokutil

Server which are not coming after reboot:

Detach the boot disk
Attach the disk to the RHEL system which has shim-x64-15-2.el7.x86_64 version installed. You can provision new RHEL 7 VM. The RHEL7 image on GCP has shim-x64-15-2.el7.x86_64 package.
After attaching the disk to the instance, execute the below command to get the disks details of the newly attached disk

lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    0    20G  0 disk
├─sda1   8:1    0   200M  0 part /boot/efi
└─sda2   8:2    0  19.8G  0 part /
sdb      8:16   0    50G  0 disk
├─sdb1   8:17   0   200M  0 part            <~~~~~~~~
└─sdb2   8:18   0  49.8G  0 part

Mount the boot volume

mount /dev/sdd1 /mnt

copy BOOTX64.EFI

cp -f /boot/efi/EFI/BOOT/BOOTX64.EFI /mnt/EFI/BOOT/BOOTX64.EFI

copy all .efi files

cp -f /boot/efi/EFI/redhat/*.efi /mnt/EFI/redhat/ #for RHEL7

cp -f /boot/efi/EFI/centos/*.efi /mnt/EFI/centos/ #for CentOS7

umount the disk

umount /mnt

Detach the disk which you attached in step 2.
Attached the disk back to the affected instance.
Boot the affected box
Downgrade the package on affected box.

yum downgrade shim\* grub2\* mokutil

Reboot
Done

Upvotes: 1

ben Thijssen

Reputation: 71

This only seems to effect systems using UEFI-boot.

As already clearly mentioned by Serhii Rohoza I solved the problem by reading page https://access.redhat.com/solutions/5272311

It is best to read that page completely. It also says what to do to prevent the same will occur when performing the next update while the problem still isn't fixed. Be prepared you might have to change /etc/resolv.conf and you might have to manually set the default gateway when using a rescue boot image, something like: ip route add default via 192.168.1.1 dev eno1

I also had to change my /etc/sysconfig/network-scripts/ifcfg- ...... from BOOTPROTO=static to BOOTPROTO=dhcp and some other settings like DNS="8.8.8.8" (Google nameserver). After fixing it, set everything back to it's original state.

Upvotes: 0

Serhii

Reputation: 4461

This is know issue and Google Engineers are aware of it:

We are currently experiencing an issue with Google Compute Engine instances running RHEL and CentOS 7 and 8. More details on this issue are available in the following article and bugs:

https://access.redhat.com/solutions/5272311

https://bugzilla.redhat.com/show_bug.cgi?id=1861977 (RHEL 8)

https://bugzilla.redhat.com/show_bug.cgi?id=1862045 (RHEL 7)

Symptoms: Instances running RHEL and CentOS 7 and 8 that run yum update may fail to boot after restart with errors messages referring to a combination of:

"X64 Exception Type - 0D(#GP - General Protection) CPU Apic ID",

"FXSAVE_STATE",

or "Find image based on IP".

This issue affects instances with specific versions of the shim package installed. To find the currently installed shim version, use the following command: rpm -q shim-x64

Affected shim versions: CentOS 7: shim-x64-15-7.el7_9.x86_64 CentOS 8: shim-x64-15-13.el8.x86_64 RHEL 7: shim-x64-15-7.el7_8.x86_64 RHEL 8: shim-x64-15-14.el8_2.x86_64

Workaround: Do not update or reboot instances running RHEL or CentOS 7 and 8. If you are on an affected shim version, run yum downgrade shim\* grub2\* mokutil to downgrade to the correct version. This command may not work on CentOS 8. If you have already rebooted, you will need to attach the disk to a working instance (that has not been updated with the problematic shim binary), and copy over the working shim binary to the relevant EFI directory on the mounted disk. For RHEL, this is /boot/efi/EFI/redhat/shimx64.efi. For CentOS, this is /boot/efi/EFI/centos/shimx64.efi

Please follow the redhat thread for realtime updates. We will update here once the issue is resolved on our end.

I'd recommend you to join and follow this case at the Google Issue Tracker.

Also, you can check status of this issue at the Google Cloud Status Dashboard.

Upvotes: 2

Unable to access the VM after first reboot when using Startup Script

Answers (3)

Related Questions