Reputation: 4537
I'm trying to run darknet imagenet classifier on Nao, but it crashes with a segfault.
With the YOLO config (./darknet detect cfg/yolo.cfg yolo.weights data/dog.jpg
), darknet runs, but trying to run classifier (./darknet classifier predict cfg/imagenet1k.data cfg/extraction.cfg extraction.weights data/dog.jpg
) only produces a segfault :
$ ./darknet classifier predict cfg/imagenet1k.data cfg/extraction.cfg extraction.weights data/dog.jpg
layer filters size input output
0 conv 64 7 x 7 / 2 224 x 224 x 3 -> 112 x 112 x 64
1 max 2 x 2 / 2 112 x 112 x 64 -> 56 x 56 x 64
2 conv 192 3 x 3 / 1 56 x 56 x 64 -> 56 x 56 x 192
3 max 2 x 2 / 2 56 x 56 x 192 -> 28 x 28 x 192
4 conv 128 1 x 1 / 1 28 x 28 x 192 -> 28 x 28 x 128
5 conv 256 3 x 3 / 1 28 x 28 x 128 -> 28 x 28 x 256
6 conv 256 1 x 1 / 1 28 x 28 x 256 -> 28 x 28 x 256
7 conv 512 3 x 3 / 1 28 x 28 x 256 -> 28 x 28 x 512
8 max 2 x 2 / 2 28 x 28 x 512 -> 14 x 14 x 512
9 conv 256 1 x 1 / 1 14 x 14 x 512 -> 14 x 14 x 256
10 conv 512 3 x 3 / 1 14 x 14 x 256 -> 14 x 14 x 512
11 conv 256 1 x 1 / 1 14 x 14 x 512 -> 14 x 14 x 256
12 conv 512 3 x 3 / 1 14 x 14 x 256 -> 14 x 14 x 512
13 conv 256 1 x 1 / 1 14 x 14 x 512 -> 14 x 14 x 256
14 conv 512 3 x 3 / 1 14 x 14 x 256 -> 14 x 14 x 512
15 conv 256 1 x 1 / 1 14 x 14 x 512 -> 14 x 14 x 256
16 Segmentation fault (core dumped)
Core dump is not available, as /proc/sys/kernel/core_pattern contains only |/bin/false
But running it with gdb, I could get the crash stack :
#0 0x0806efac in make_convolutional_layer ()
#1 0x080a4919 in parse_convolutional ()
#2 0x080a6e11 in parse_network_cfg ()
#3 0x0805d7ef in predict_classifier ()
#4 0x0805e85c in run_classifier ()
#5 0x080499c0 in main ()
I see make_convolutional_layer allocates a bunch of memory. Could the crash be the program reaching memory limit ? However In YOLO mode, it builds a bigger network (with greater layers size), so it doesn't sound too logical. Any idea ?
Upvotes: 1
Views: 1449
Reputation: 4537
Actually it's caused by a memory shortage, a call to calloc that returns null.
(and that seems to always happen on the line l.weights = calloc(c*n*size*size, sizeof(float));
in make_convolutional_layer
, that on the 16th layer tries to allocate 4 718 592 bytes)
So there doesn't seem to be an answer to the problem, apart from trying to build a smaller network or increase available memory.
Edit: The smallest "Darknet Reference" network runs, the others are too heavy for Nao.
Upvotes: 1