Reputation: 515
I built a caffe network + solver (for binary classification) and when I run the code (and try to train the network), I see this error:
I0914 20:03:01.362612 4024 solver.cpp:280] Learning Rate Policy: step
I0914 20:03:01.367985 4024 solver.cpp:337] Iteration 0, Testing net (#0)
I0914 20:03:01.368085 4024 net.cpp:693] Ignoring source layer train_database
I0914 20:03:04.568979 4024 solver.cpp:404] Test net output #0: accuracy = 0.07575
I0914 20:03:04.569093 4024 solver.cpp:404] Test net output #1: loss = 2.20947 (* 1 = 2.20947 loss)
I0914 20:03:04.610549 4024 solver.cpp:228] Iteration 0, loss = 2.31814
I0914 20:03:04.610666 4024 solver.cpp:244] Train net output #0: loss = 2.31814 (* 1 = 2.31814 loss)
*** Aborted at 1473872584 (unix time) try "date -d @1473872584" if you are using GNU date ***
PC: @ 0x7f6870b62c52 caffe::SGDSolver<>::GetLearningRate()
*** SIGFPE (@0x7f6870b62c52) received by PID 4024 (TID 0x7f6871004a40) from PID 1890987090; stack trace: ***
@ 0x7f686f6bbcb0 (unknown)
@ 0x7f6870b62c52 caffe::SGDSolver<>::GetLearningRate()
@ 0x7f6870b62e44 caffe::SGDSolver<>::ApplyUpdate()
@ 0x7f6870b8e2fc caffe::Solver<>::Step()
@ 0x7f6870b8eb09 caffe::Solver<>::Solve()
@ 0x40821d train()
@ 0x40589c main
@ 0x7f686f6a6f45 (unknown)
@ 0x40610b (unknown)
@ 0x0 (unknown)
Floating point exception (core dumped)
I searched a lot, and the main solutions that I've found is to:
1. recompile the caffe files. tried make clean
-> make all
-> make test
-> make runtest
2. change the driver that the linux uses. I used the red and changed to the green (note: I'm using CPU with my caffe, and it's mentioned in the makeconfig file):
All of this didn't help, and I still can't run my network.
Does anyone have an idea? thanks a lot, anyway :)
this is the full log:
/home/roishik/anaconda2/bin/python /home/roishik/Desktop/Thesis/Code/cafe_cnn/third/code/run_network.py
I0914 20:03:01.142490 4024 caffe.cpp:210] Use CPU.
I0914 20:03:01.142940 4024 solver.cpp:48] Initializing solver from parameters:
test_iter: 400
test_interval: 400
base_lr: 0.001
display: 50
max_iter: 40000
lr_policy: "step"
gamma: 0.1
momentum: 0.9
weight_decay: 0.0005
snapshot: 5000
snapshot_prefix: "/home/roishik/Desktop/Thesis/Code/cafe_cnn/third/caffe_models/my_new/snapshots"
solver_mode: CPU
net: "/home/roishik/Desktop/Thesis/Code/cafe_cnn/third/caffe_models/my_new/fc_net_ver1.prototxt"
train_state {
level: 0
stage: ""
}
I0914 20:03:01.143082 4024 solver.cpp:91] Creating training net from net file: /home/roishik/Desktop/Thesis/Code/cafe_cnn/third/caffe_models/my_new/fc_net_ver1.prototxt
I0914 20:03:01.143712 4024 net.cpp:322] The NetState phase (0) differed from the phase (1) specified by a rule in layer validation_database
I0914 20:03:01.143754 4024 net.cpp:322] The NetState phase (0) differed from the phase (1) specified by a rule in layer accuracy
I0914 20:03:01.143913 4024 net.cpp:58] Initializing net from parameters:
name: "fc2Net"
state {
phase: TRAIN
level: 0
stage: ""
}
layer {
name: "train_database"
type: "Data"
top: "data"
top: "label"
include {
phase: TRAIN
}
transform_param {
mean_file: "/home/roishik/Desktop/Thesis/Code/cafe_cnn/third/input/mean.binaryproto"
}
data_param {
source: "/home/roishik/Desktop/Thesis/Code/cafe_cnn/third/input/train_lmdb"
batch_size: 200
backend: LMDB
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "data"
top: "fc1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1024
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1024
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "fc2"
top: "fc2"
}
layer {
name: "fc3"
type: "InnerProduct"
bottom: "fc2"
top: "fc3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc3"
bottom: "label"
top: "loss"
}
I0914 20:03:01.144016 4024 layer_factory.hpp:77] Creating layer train_database
I0914 20:03:01.144811 4024 net.cpp:100] Creating Layer train_database
I0914 20:03:01.144846 4024 net.cpp:408] train_database -> data
I0914 20:03:01.144909 4024 net.cpp:408] train_database -> label
I0914 20:03:01.144951 4024 data_transformer.cpp:25] Loading mean file from: /home/roishik/Desktop/Thesis/Code/cafe_cnn/third/input/mean.binaryproto
I0914 20:03:01.153393 4035 db_lmdb.cpp:35] Opened lmdb /home/roishik/Desktop/Thesis/Code/cafe_cnn/third/input/train_lmdb
I0914 20:03:01.153481 4024 data_layer.cpp:41] output data size: 200,1,32,32
I0914 20:03:01.154615 4024 net.cpp:150] Setting up train_database
I0914 20:03:01.154670 4024 net.cpp:157] Top shape: 200 1 32 32 (204800)
I0914 20:03:01.154693 4024 net.cpp:157] Top shape: 200 (200)
I0914 20:03:01.154712 4024 net.cpp:165] Memory required for data: 820000
I0914 20:03:01.154742 4024 layer_factory.hpp:77] Creating layer fc1
I0914 20:03:01.154781 4024 net.cpp:100] Creating Layer fc1
I0914 20:03:01.154804 4024 net.cpp:434] fc1 <- data
I0914 20:03:01.154837 4024 net.cpp:408] fc1 -> fc1
I0914 20:03:01.159675 4036 blocking_queue.cpp:50] Waiting for data
I0914 20:03:01.215118 4024 net.cpp:150] Setting up fc1
I0914 20:03:01.215214 4024 net.cpp:157] Top shape: 200 1024 (204800)
I0914 20:03:01.215237 4024 net.cpp:165] Memory required for data: 1639200
I0914 20:03:01.215306 4024 layer_factory.hpp:77] Creating layer relu1
I0914 20:03:01.215342 4024 net.cpp:100] Creating Layer relu1
I0914 20:03:01.215363 4024 net.cpp:434] relu1 <- fc1
I0914 20:03:01.215387 4024 net.cpp:395] relu1 -> fc1 (in-place)
I0914 20:03:01.215417 4024 net.cpp:150] Setting up relu1
I0914 20:03:01.215440 4024 net.cpp:157] Top shape: 200 1024 (204800)
I0914 20:03:01.215459 4024 net.cpp:165] Memory required for data: 2458400
I0914 20:03:01.215478 4024 layer_factory.hpp:77] Creating layer fc2
I0914 20:03:01.215504 4024 net.cpp:100] Creating Layer fc2
I0914 20:03:01.215524 4024 net.cpp:434] fc2 <- fc1
I0914 20:03:01.215549 4024 net.cpp:408] fc2 -> fc2
I0914 20:03:01.264021 4024 net.cpp:150] Setting up fc2
I0914 20:03:01.264062 4024 net.cpp:157] Top shape: 200 1024 (204800)
I0914 20:03:01.264072 4024 net.cpp:165] Memory required for data: 3277600
I0914 20:03:01.264097 4024 layer_factory.hpp:77] Creating layer relu2
I0914 20:03:01.264118 4024 net.cpp:100] Creating Layer relu2
I0914 20:03:01.264129 4024 net.cpp:434] relu2 <- fc2
I0914 20:03:01.264143 4024 net.cpp:395] relu2 -> fc2 (in-place)
I0914 20:03:01.264166 4024 net.cpp:150] Setting up relu2
I0914 20:03:01.264181 4024 net.cpp:157] Top shape: 200 1024 (204800)
I0914 20:03:01.264190 4024 net.cpp:165] Memory required for data: 4096800
I0914 20:03:01.264201 4024 layer_factory.hpp:77] Creating layer fc3
I0914 20:03:01.264219 4024 net.cpp:100] Creating Layer fc3
I0914 20:03:01.264230 4024 net.cpp:434] fc3 <- fc2
I0914 20:03:01.264245 4024 net.cpp:408] fc3 -> fc3
I0914 20:03:01.264389 4024 net.cpp:150] Setting up fc3
I0914 20:03:01.264407 4024 net.cpp:157] Top shape: 200 2 (400)
I0914 20:03:01.264416 4024 net.cpp:165] Memory required for data: 4098400
I0914 20:03:01.264434 4024 layer_factory.hpp:77] Creating layer loss
I0914 20:03:01.264447 4024 net.cpp:100] Creating Layer loss
I0914 20:03:01.264459 4024 net.cpp:434] loss <- fc3
I0914 20:03:01.264469 4024 net.cpp:434] loss <- label
I0914 20:03:01.264487 4024 net.cpp:408] loss -> loss
I0914 20:03:01.264513 4024 layer_factory.hpp:77] Creating layer loss
I0914 20:03:01.264544 4024 net.cpp:150] Setting up loss
I0914 20:03:01.264559 4024 net.cpp:157] Top shape: (1)
I0914 20:03:01.264569 4024 net.cpp:160] with loss weight 1
I0914 20:03:01.264595 4024 net.cpp:165] Memory required for data: 4098404
I0914 20:03:01.264606 4024 net.cpp:226] loss needs backward computation.
I0914 20:03:01.264617 4024 net.cpp:226] fc3 needs backward computation.
I0914 20:03:01.264626 4024 net.cpp:226] relu2 needs backward computation.
I0914 20:03:01.264636 4024 net.cpp:226] fc2 needs backward computation.
I0914 20:03:01.264647 4024 net.cpp:226] relu1 needs backward computation.
I0914 20:03:01.264655 4024 net.cpp:226] fc1 needs backward computation.
I0914 20:03:01.264667 4024 net.cpp:228] train_database does not need backward computation.
I0914 20:03:01.264675 4024 net.cpp:270] This network produces output loss
I0914 20:03:01.264695 4024 net.cpp:283] Network initialization done.
I0914 20:03:01.265384 4024 solver.cpp:181] Creating test net (#0) specified by net file: /home/roishik/Desktop/Thesis/Code/cafe_cnn/third/caffe_models/my_new/fc_net_ver1.prototxt
I0914 20:03:01.265435 4024 net.cpp:322] The NetState phase (1) differed from the phase (0) specified by a rule in layer train_database
I0914 20:03:01.265606 4024 net.cpp:58] Initializing net from parameters:
name: "fc2Net"
state {
phase: TEST
}
layer {
name: "validation_database"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mean_file: "/home/roishik/Desktop/Thesis/Code/cafe_cnn/second/input/mean.binaryproto"
}
data_param {
source: "/home/roishik/Desktop/Thesis/Code/cafe_cnn/second/input/validation_lmdb"
batch_size: 40
backend: LMDB
}
}
layer {
name: "fc1"
type: "InnerProduct"
bottom: "data"
top: "fc1"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1024
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "fc1"
top: "fc1"
}
layer {
name: "fc2"
type: "InnerProduct"
bottom: "fc1"
top: "fc2"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 1024
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "fc2"
top: "fc2"
}
layer {
name: "fc3"
type: "InnerProduct"
bottom: "fc2"
top: "fc3"
param {
lr_mult: 1
decay_mult: 1
}
param {
lr_mult: 2
decay_mult: 0
}
inner_product_param {
num_output: 2
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc3"
bottom: "label"
top: "accuracy"
include {
phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc3"
bottom: "label"
top: "loss"
}
I0914 20:03:01.265750 4024 layer_factory.hpp:77] Creating layer validation_database
I0914 20:03:01.265878 4024 net.cpp:100] Creating Layer validation_database
I0914 20:03:01.265897 4024 net.cpp:408] validation_database -> data
I0914 20:03:01.265918 4024 net.cpp:408] validation_database -> label
I0914 20:03:01.265936 4024 data_transformer.cpp:25] Loading mean file from: /home/roishik/Desktop/Thesis/Code/cafe_cnn/second/input/mean.binaryproto
I0914 20:03:01.266034 4037 db_lmdb.cpp:35] Opened lmdb /home/roishik/Desktop/Thesis/Code/cafe_cnn/second/input/validation_lmdb
I0914 20:03:01.266098 4024 data_layer.cpp:41] output data size: 40,1,32,32
I0914 20:03:01.266295 4024 net.cpp:150] Setting up validation_database
I0914 20:03:01.266315 4024 net.cpp:157] Top shape: 40 1 32 32 (40960)
I0914 20:03:01.266330 4024 net.cpp:157] Top shape: 40 (40)
I0914 20:03:01.266340 4024 net.cpp:165] Memory required for data: 164000
I0914 20:03:01.266350 4024 layer_factory.hpp:77] Creating layer label_validation_database_1_split
I0914 20:03:01.266386 4024 net.cpp:100] Creating Layer label_validation_database_1_split
I0914 20:03:01.266404 4024 net.cpp:434] label_validation_database_1_split <- label
I0914 20:03:01.266422 4024 net.cpp:408] label_validation_database_1_split -> label_validation_database_1_split_0
I0914 20:03:01.266443 4024 net.cpp:408] label_validation_database_1_split -> label_validation_database_1_split_1
I0914 20:03:01.266464 4024 net.cpp:150] Setting up label_validation_database_1_split
I0914 20:03:01.266480 4024 net.cpp:157] Top shape: 40 (40)
I0914 20:03:01.266494 4024 net.cpp:157] Top shape: 40 (40)
I0914 20:03:01.266505 4024 net.cpp:165] Memory required for data: 164320
I0914 20:03:01.266515 4024 layer_factory.hpp:77] Creating layer fc1
I0914 20:03:01.266531 4024 net.cpp:100] Creating Layer fc1
I0914 20:03:01.266543 4024 net.cpp:434] fc1 <- data
I0914 20:03:01.266558 4024 net.cpp:408] fc1 -> fc1
I0914 20:03:01.320364 4024 net.cpp:150] Setting up fc1
I0914 20:03:01.320461 4024 net.cpp:157] Top shape: 40 1024 (40960)
I0914 20:03:01.320489 4024 net.cpp:165] Memory required for data: 328160
I0914 20:03:01.320533 4024 layer_factory.hpp:77] Creating layer relu1
I0914 20:03:01.320571 4024 net.cpp:100] Creating Layer relu1
I0914 20:03:01.320597 4024 net.cpp:434] relu1 <- fc1
I0914 20:03:01.320627 4024 net.cpp:395] relu1 -> fc1 (in-place)
I0914 20:03:01.320652 4024 net.cpp:150] Setting up relu1
I0914 20:03:01.320667 4024 net.cpp:157] Top shape: 40 1024 (40960)
I0914 20:03:01.320678 4024 net.cpp:165] Memory required for data: 492000
I0914 20:03:01.320689 4024 layer_factory.hpp:77] Creating layer fc2
I0914 20:03:01.320709 4024 net.cpp:100] Creating Layer fc2
I0914 20:03:01.320719 4024 net.cpp:434] fc2 <- fc1
I0914 20:03:01.320734 4024 net.cpp:408] fc2 -> fc2
I0914 20:03:01.361732 4024 net.cpp:150] Setting up fc2
I0914 20:03:01.361766 4024 net.cpp:157] Top shape: 40 1024 (40960)
I0914 20:03:01.361802 4024 net.cpp:165] Memory required for data: 655840
I0914 20:03:01.361821 4024 layer_factory.hpp:77] Creating layer relu2
I0914 20:03:01.361837 4024 net.cpp:100] Creating Layer relu2
I0914 20:03:01.361845 4024 net.cpp:434] relu2 <- fc2
I0914 20:03:01.361852 4024 net.cpp:395] relu2 -> fc2 (in-place)
I0914 20:03:01.361866 4024 net.cpp:150] Setting up relu2
I0914 20:03:01.361872 4024 net.cpp:157] Top shape: 40 1024 (40960)
I0914 20:03:01.361877 4024 net.cpp:165] Memory required for data: 819680
I0914 20:03:01.361881 4024 layer_factory.hpp:77] Creating layer fc3
I0914 20:03:01.361892 4024 net.cpp:100] Creating Layer fc3
I0914 20:03:01.361901 4024 net.cpp:434] fc3 <- fc2
I0914 20:03:01.361909 4024 net.cpp:408] fc3 -> fc3
I0914 20:03:01.362009 4024 net.cpp:150] Setting up fc3
I0914 20:03:01.362017 4024 net.cpp:157] Top shape: 40 2 (80)
I0914 20:03:01.362022 4024 net.cpp:165] Memory required for data: 820000
I0914 20:03:01.362032 4024 layer_factory.hpp:77] Creating layer fc3_fc3_0_split
I0914 20:03:01.362041 4024 net.cpp:100] Creating Layer fc3_fc3_0_split
I0914 20:03:01.362046 4024 net.cpp:434] fc3_fc3_0_split <- fc3
I0914 20:03:01.362053 4024 net.cpp:408] fc3_fc3_0_split -> fc3_fc3_0_split_0
I0914 20:03:01.362062 4024 net.cpp:408] fc3_fc3_0_split -> fc3_fc3_0_split_1
I0914 20:03:01.362073 4024 net.cpp:150] Setting up fc3_fc3_0_split
I0914 20:03:01.362082 4024 net.cpp:157] Top shape: 40 2 (80)
I0914 20:03:01.362088 4024 net.cpp:157] Top shape: 40 2 (80)
I0914 20:03:01.362093 4024 net.cpp:165] Memory required for data: 820640
I0914 20:03:01.362097 4024 layer_factory.hpp:77] Creating layer accuracy
I0914 20:03:01.362120 4024 net.cpp:100] Creating Layer accuracy
I0914 20:03:01.362128 4024 net.cpp:434] accuracy <- fc3_fc3_0_split_0
I0914 20:03:01.362134 4024 net.cpp:434] accuracy <- label_validation_database_1_split_0
I0914 20:03:01.362141 4024 net.cpp:408] accuracy -> accuracy
I0914 20:03:01.362152 4024 net.cpp:150] Setting up accuracy
I0914 20:03:01.362159 4024 net.cpp:157] Top shape: (1)
I0914 20:03:01.362164 4024 net.cpp:165] Memory required for data: 820644
I0914 20:03:01.362169 4024 layer_factory.hpp:77] Creating layer loss
I0914 20:03:01.362176 4024 net.cpp:100] Creating Layer loss
I0914 20:03:01.362181 4024 net.cpp:434] loss <- fc3_fc3_0_split_1
I0914 20:03:01.362187 4024 net.cpp:434] loss <- label_validation_database_1_split_1
I0914 20:03:01.362193 4024 net.cpp:408] loss -> loss
I0914 20:03:01.362226 4024 layer_factory.hpp:77] Creating layer loss
I0914 20:03:01.362251 4024 net.cpp:150] Setting up loss
I0914 20:03:01.362265 4024 net.cpp:157] Top shape: (1)
I0914 20:03:01.362277 4024 net.cpp:160] with loss weight 1
I0914 20:03:01.362298 4024 net.cpp:165] Memory required for data: 820648
I0914 20:03:01.362311 4024 net.cpp:226] loss needs backward computation.
I0914 20:03:01.362323 4024 net.cpp:228] accuracy does not need backward computation.
I0914 20:03:01.362336 4024 net.cpp:226] fc3_fc3_0_split needs backward computation.
I0914 20:03:01.362347 4024 net.cpp:226] fc3 needs backward computation.
I0914 20:03:01.362360 4024 net.cpp:226] relu2 needs backward computation.
I0914 20:03:01.362370 4024 net.cpp:226] fc2 needs backward computation.
I0914 20:03:01.362381 4024 net.cpp:226] relu1 needs backward computation.
I0914 20:03:01.362392 4024 net.cpp:226] fc1 needs backward computation.
I0914 20:03:01.362403 4024 net.cpp:228] label_validation_database_1_split does not need backward computation.
I0914 20:03:01.362416 4024 net.cpp:228] validation_database does not need backward computation.
I0914 20:03:01.362426 4024 net.cpp:270] This network produces output accuracy
I0914 20:03:01.362438 4024 net.cpp:270] This network produces output loss
I0914 20:03:01.362460 4024 net.cpp:283] Network initialization done.
I0914 20:03:01.362552 4024 solver.cpp:60] Solver scaffolding done.
I0914 20:03:01.362591 4024 caffe.cpp:251] Starting Optimization
I0914 20:03:01.362601 4024 solver.cpp:279] Solving fc2Net
I0914 20:03:01.362612 4024 solver.cpp:280] Learning Rate Policy: step
I0914 20:03:01.367985 4024 solver.cpp:337] Iteration 0, Testing net (#0)
I0914 20:03:01.368085 4024 net.cpp:693] Ignoring source layer train_database
I0914 20:03:04.568979 4024 solver.cpp:404] Test net output #0: accuracy = 0.07575
I0914 20:03:04.569093 4024 solver.cpp:404] Test net output #1: loss = 2.20947 (* 1 = 2.20947 loss)
I0914 20:03:04.610549 4024 solver.cpp:228] Iteration 0, loss = 2.31814
I0914 20:03:04.610666 4024 solver.cpp:244] Train net output #0: loss = 2.31814 (* 1 = 2.31814 loss)
*** Aborted at 1473872584 (unix time) try "date -d @1473872584" if you are using GNU date ***
PC: @ 0x7f6870b62c52 caffe::SGDSolver<>::GetLearningRate()
*** SIGFPE (@0x7f6870b62c52) received by PID 4024 (TID 0x7f6871004a40) from PID 1890987090; stack trace: ***
@ 0x7f686f6bbcb0 (unknown)
@ 0x7f6870b62c52 caffe::SGDSolver<>::GetLearningRate()
@ 0x7f6870b62e44 caffe::SGDSolver<>::ApplyUpdate()
@ 0x7f6870b8e2fc caffe::Solver<>::Step()
@ 0x7f6870b8eb09 caffe::Solver<>::Solve()
@ 0x40821d train()
@ 0x40589c main
@ 0x7f686f6a6f45 (unknown)
@ 0x40610b (unknown)
@ 0x0 (unknown)
Floating point exception (core dumped)
Done!
Upvotes: 0
Views: 506
Reputation: 114976
Look at your error message: you got SIGFPE
signal. This indicates you got an arithmetic error. Furthermore, the function that causes this error is the function that evaluates the learning rate.
It appears as if you did not configure the learning rate policy correctly in your 'solver.prototxt'
Upvotes: 4