Reputation: 83
We have an issue where our CentOS 7 server will not generate a kernel dump file in /var/crash upon Kernel panic. It appears the crash kernel never boots. We’ve followed the Rhel guide (http://red.ht/1sCztdv) on configuring crash dumps and at first glance everything appears to be configured correctly. We are triggering a panic like this:
echo 1 > /proc/sys/kernel/sysrq
echo c > /proc/sysrq-trigger
This causes the system to freeze. We get no messages on the console and the console becomes unresponsive. At this point I would imagine the system would boot a crash kernel and begin writing a dump out to /var/crash. I’ve left it in this frozen state for up to 30 minutes to give it time to complete the entire dump. However after a hard cold reboot /var/crash is empty.
Additionally, I've replicated the configuration in a KVM virtual machine and kdump words as expected. So there is either something wrong with my configuration on the physical system or something odd about that hardware config that causes the hang rather than the dump.
Our server is an HP G9 with 24 cores and 128GB of memory. Here are some other details:
[user@host]$ cat /proc/cmdline
BOOT_IMAGE=/vmlinuz-3.10.0-123.el7.x86_64 root=UUID=287798f7-fe7a-4172-a35a-6a78051af4d2 ro rd.lvm.lv=vg_sda/lv_root vconsole.font=latarcyrheb-sun16 rd.lvm.lv=vg_sda/lv_swap crashkernel=auto vconsole.keymap=us rhgb nosoftlockup intel_idle.max_cstate=0 mce=ignore_ce processor.max_cstate=0 idle=mwait isolcpus=2-11,14-23
[user@host]$ systemctl is-active kdump
active
[user@host]$ cat /etc/kdump.conf
path /var/crash
core_collector makedumpfile -l --message-level 1 -d 31 -c
[user@host]$ cat /proc/iomem |grep Crash
2b000000-357fffff : Crash kernel
[user@host]$ dmesg|grep Reserving
[ 0.000000] Reserving 168MB of memory at 688MB for crashkernel (System RAM: 131037MB)
[user@host]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg_sda-lv_root 133G 4.7G 128G 4% /
devtmpfs 63G 0 63G 0% /dev
tmpfs 63G 0 63G 0% /dev/shm
tmpfs 63G 9.1M 63G 1% /run
tmpfs 63G 0 63G 0% /sys/fs/cgroup
/dev/sda1 492M 175M 318M 36% /boot
/dev/mapper/vg_sdb-lv_data 2.8T 145G 2.6T 6% /data
Upvotes: 2
Views: 5980
Reputation: 11
Eric,
1G seems a bit large. I've never seen anything larger than 200M for a normal server. Not sure about the sysconfig settings. Compression is a good idea but I don't think it would affect the issue since you're target is close to total memory and you're only dumping the kernel ring.
Upvotes: 0
Reputation: 83
After modifying the following parameters we were able to reliably get crash dumps:
Not 100% sure why this works but it does. Would love to know what others think
Eric
Upvotes: 2