user3745855
user3745855

Reputation: 83

CentOs 7 fails to boot crash kernel and generate dump in /var/crash

We have an issue where our CentOS 7 server will not generate a kernel dump file in /var/crash upon Kernel panic. It appears the crash kernel never boots. We’ve followed the Rhel guide (http://red.ht/1sCztdv) on configuring crash dumps and at first glance everything appears to be configured correctly. We are triggering a panic like this:

echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger

This causes the system to freeze. We get no messages on the console and the console becomes unresponsive. At this point I would imagine the system would boot a crash kernel and begin writing a dump out to /var/crash. I’ve left it in this frozen state for up to 30 minutes to give it time to complete the entire dump. However after a hard cold reboot /var/crash is empty.

Additionally, I've replicated the configuration in a KVM virtual machine and kdump words as expected. So there is either something wrong with my configuration on the physical system or something odd about that hardware config that causes the hang rather than the dump.

Our server is an HP G9 with 24 cores and 128GB of memory. Here are some other details:

[user@host]$ cat /proc/cmdline

BOOT_IMAGE=/vmlinuz-3.10.0-123.el7.x86_64 root=UUID=287798f7-fe7a-4172-a35a-6a78051af4d2 ro rd.lvm.lv=vg_sda/lv_root vconsole.font=latarcyrheb-sun16 rd.lvm.lv=vg_sda/lv_swap crashkernel=auto vconsole.keymap=us rhgb nosoftlockup intel_idle.max_cstate=0 mce=ignore_ce processor.max_cstate=0 idle=mwait isolcpus=2-11,14-23

[user@host]$ systemctl is-active kdump
active

[user@host]$ cat /etc/kdump.conf 

path /var/crash
core_collector makedumpfile -l --message-level 1 -d 31 -c

[user@host]$ cat /proc/iomem |grep Crash
2b000000-357fffff : Crash kernel

[user@host]$ dmesg|grep Reserving
[    0.000000] Reserving 168MB of memory at 688MB for crashkernel (System RAM: 131037MB)

[user@host]$ df -h
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/vg_sda-lv_root  133G  4.7G  128G   4% /
devtmpfs                     63G     0   63G   0% /dev
tmpfs                        63G     0   63G   0% /dev/shm
tmpfs                        63G  9.1M   63G   1% /run
tmpfs                        63G     0   63G   0% /sys/fs/cgroup
/dev/sda1                   492M  175M  318M  36% /boot
/dev/mapper/vg_sdb-lv_data  2.8T  145G  2.6T   6% /data

Upvotes: 2

Views: 5980

Answers (2)

Nick
Nick

Reputation: 11

Eric,

1G seems a bit large. I've never seen anything larger than 200M for a normal server. Not sure about the sysconfig settings. Compression is a good idea but I don't think it would affect the issue since you're target is close to total memory and you're only dumping the kernel ring.

Upvotes: 0

user3745855
user3745855

Reputation: 83

After modifying the following parameters we were able to reliably get crash dumps:

  1. Changed crashkernel=auto to crashkernel=1G: I'm not sure why we need 1G as the formula indicated 128M+64M for every 1TB of ram.
  2. /etc/sysconfig/kdump: Removed everything from KDUMP_COMMANDLINE_APPEND excpet irqpoll nr_cpus=1 resulting in: KDUMP_COMMANDLINE_APPEND="irqpoll nr_cpus=1
  3. /etc/kdump.cfg: Add compression (“-c”) to makedump

Not 100% sure why this works but it does. Would love to know what others think

Eric

Upvotes: 2

Related Questions