Reputation: 982
I am running a Qt application on embedded Linux platform. The system has 128 MB RAM, 512MB NAND, no swap. The application uses a custom library for the peripherals, the rest are all Qt and c/c++ libs. The application uses SQLITE3 as well.
After 2-3 hours, the machine starts running very slow, shell commands take 10 or so seconds to respond. Eventually the machine hangs, and finally OOM killer kills the application, and the system starts behaving at normal speed.
After some system memory observations using top command reveals that while application is running, the system free memory is decreasing, while slab keeps on increasing. These are the snaps of top given below. The application is named xyz.
At Application start :
Mem total:126164 anon:3308 map:8436 free:32456
slab:60936 buf:0 cache:27528 dirty:0 write:0
Swap total:0 free:0
PID VSZ VSZRW^ RSS (SHR) DIRTY (SHR) STACK COMMAND
776 29080 9228 8036 528 968 0 84 ./xyz -qws
781 3960 736 1976 1456 520 0 84 sshd: root@notty
786 3676 680 1208 764 416 0 88 /usr/libexec/sftp-server
770 3792 568 1948 1472 464 0 84 {sshd} sshd: root@pts/0
766 3792 568 956 688 252 0 84 /usr/sbin/sshd
388 1864 284 552 332 188 0 84 udevd --daemon
789 2832 272 688 584 84 0 84 top
774 2828 268 668 560 84 0 84 -sh
709 2896 268 556 464 80 0 84 /usr/sbin/inetd
747 2828 268 596 516 68 0 84 /sbin/getty -L ttymxc0 115200 vt100
777 2824 264 444 368 68 0 84 tee out.log
785 2824 264 484 416 68 0 84 sh -c /usr/libexec/sftp-server
1 2824 264 556 488 64 0 84 init
After some time :
Mem total:126164 anon:3312 map:8440 free:9244
slab:83976 buf:0 cache:27584 dirty:0 write:0
Swap total:0 free:0
PID VSZ VSZRW^ RSS (SHR) DIRTY (SHR) STACK COMMAND
776 29080 9228 8044 528 972 0 84 ./xyz -qws
781 3960 736 1976 1456 520 0 84 sshd: root@notty
786 3676 680 1208 764 416 0 88 /usr/libexec/sftp-server
770 3792 568 1948 1472 464 0 84 {sshd} sshd: root@pts/0
766 3792 568 956 688 252 0 84 /usr/sbin/sshd
388 1864 284 552 332 188 0 84 udevd --daemon
789 2832 272 688 584 84 0 84 top
774 2828 268 668 560 84 0 84 -sh
709 2896 268 556 464 80 0 84 /usr/sbin/inetd
747 2828 268 596 516 68 0 84 /sbin/getty -L ttymxc0 115200 vt100
777 2824 264 444 368 68 0 84 tee out.log
785 2824 264 484 416 68 0 84 sh -c /usr/libexec/sftp-server
1 2824 264 556 488 64 0 84 init
Funnily though, I can not see any major changes in the output of top involving the application itself. Eventually the application is killed, top output after that :
Mem total:126164 anon:2356 map:916 free:2368
slab:117944 buf:0 cache:1580 dirty:0 write:0
Swap total:0 free:0
PID VSZ VSZRW^ RSS (SHR) DIRTY (SHR) STACK COMMAND
781 3960 736 708 184 520 0 84 sshd: root@notty
786 3724 728 736 172 484 0 88 /usr/libexec/sftp-server
770 3792 568 648 188 460 0 84 {sshd} sshd: root@pts/0
766 3792 568 252 0 252 0 84 /usr/sbin/sshd
388 1864 284 188 0 188 0 84 udevd --daemon
819 2832 272 676 348 84 0 84 top
774 2828 268 512 324 96 0 84 -sh
709 2896 268 80 0 80 0 84 /usr/sbin/inetd
747 2828 268 68 0 68 0 84 /sbin/getty -L ttymxc0 115200 vt100
785 2824 264 68 0 68 0 84 sh -c /usr/libexec/sftp-server
1 2824 264 64 0 64 0 84 init
The dmesg shows :
sh invoked oom-killer: gfp_mask=0xd0, order=2, oomkilladj=0
[<c002d4c4>] (unwind_backtrace+0x0/0xd4) from [<c0073ac0>] (oom_kill_process+0x54/0x1b8)
[<c0073ac0>] (oom_kill_process+0x54/0x1b8) from [<c0073f14>] (__out_of_memory+0x154/0x178)
[<c0073f14>] (__out_of_memory+0x154/0x178) from [<c0073fa0>] (out_of_memory+0x68/0x9c)
[<c0073fa0>] (out_of_memory+0x68/0x9c) from [<c007649c>] (__alloc_pages_nodemask+0x3e0/0x4c8)
[<c007649c>] (__alloc_pages_nodemask+0x3e0/0x4c8) from [<c0076598>] (__get_free_pages+0x14/0x4c)
[<c0076598>] (__get_free_pages+0x14/0x4c) from [<c002f528>] (get_pgd_slow+0x14/0xdc)
[<c002f528>] (get_pgd_slow+0x14/0xdc) from [<c0043890>] (mm_init+0x84/0xc4)
[<c0043890>] (mm_init+0x84/0xc4) from [<c0097b94>] (bprm_mm_init+0x10/0x138)
[<c0097b94>] (bprm_mm_init+0x10/0x138) from [<c00980a8>] (do_execve+0xf4/0x2a8)
[<c00980a8>] (do_execve+0xf4/0x2a8) from [<c002afc4>] (sys_execve+0x38/0x5c)
[<c002afc4>] (sys_execve+0x38/0x5c) from [<c0027d20>] (ret_fast_syscall+0x0/0x2c)
Mem-info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 42, btch: 7 usd: 0
Active_anon:424 active_file:11 inactive_anon:428
inactive_file:3 unevictable:0 dirty:0 writeback:0 unstable:0
free:608 slab:29498 mapped:14 pagetables:59 bounce:0
DMA free:692kB min:268kB low:332kB high:400kB active_anon:0kB inactive_anon:0kB active_file:4kB inactive_file:0kB unevictable:0kB present:24384kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 103 103
Normal free:1740kB min:1168kB low:1460kB high:1752kB active_anon:1696kB inactive_anon:1712kB active_file:40kB inactive_file:12kB unevictable:0kB present:105664kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 3*4kB 3*8kB 5*16kB 2*32kB 4*64kB 2*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 692kB
Normal: 377*4kB 1*8kB 4*16kB 1*32kB 2*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1740kB
30 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
32768 pages of RAM
687 free pages
1306 reserved pages
29498 slab pages
59 pages shared
0 pages swap cached
Out of memory: kill process 774 (sh) score 339 or a child
Killed process 776 (xyz)
So it's obvious that there is a memory leak, it must be my app since my app is killed. But I am not doing any malloc s from the program. I have taken care as to limit the scope of variables so that they are deallocated after they are used. So I am at a complete loss as to why is slab increasing in the top output. I have tried http://valgrind.org/docs/manual/faq.html#faq.reports but didn't work.
Currently trying to use Valgrind on desktop (since I have read it only works for arm-cortex) to check my business logic.
Addittional info :
root@freescale ~/Application/app$ uname -a
Linux freescale 2.6.31-207-g7286c01 #2053 Fri Jun 22 10:29:11 IST 2012 armv5tejl GNU/Linux
Compiler : arm-none-linux-gnueabi-4.1.2 glibc2.5
cpp libs : libstdc++.so.6.0.8
Qt : 4.7.3 libs
Any pointers would be greatly appreciated...
Upvotes: 5
Views: 2618
Reputation: 1130
I don't think the problem is directly in your code. The reason is obvious: your application space does not increase (both RSS and VSW do not increase).
However, you do see the number of slabs increasing. You cannot use or increase the number of slabs from your application - it's a kernel-only thingie.
Some obvious causes of slab size increase from the top of my head:
I would run strace and look at its output for a while. strace intercepts interactions with the kernel. If you have memory issues, I'd expect repeated calls to brk(). If you have other issues, you'll see repeated calls to open without close.
Upvotes: 3
Reputation: 380
If you have some data structure allocation, check for the correctness of adding children and etc.. I had similar bug in my code. Also if you make big and large queries to the database it may use more ram memory. Try to find some memory leak detector to find if there is any leak.
Upvotes: 1