Reputation: 125

Yolo not starting to train

I am trying to train Yolo on a custom dataset and everything seems to be working without errors but it just isn't training.

I followed the tutorial on https://github.com/AlexeyAB/darknet twice but I get the same results

./darknet detector train data/obj.data cfg/yolo-obj.cfg yolov4.conv.137

[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
Total BFLOPS 59.563 
avg_outputs = 489778 
Loading weights from yolov4.conv.137...
 seen 64, trained: 0 K-images (0 Kilo-batches_64) 
Done! Loaded 137 layers from weights-file 
Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005
Resizing, random_coef = 1.40 

 608 x 608 
 Create 64 permanent cpu-threads 

 mosaic=1 - compile Darknet with OpenCV for using mosaic=1

I also tried without the pre-trained weights but this doesn't start the training process either

./darknet detector train data/obj.data cfg/yolo-obj.cfg
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
Total BFLOPS 59.563 
avg_outputs = 489778 
Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005
Resizing, random_coef = 1.40 

 608 x 608 
 Create 64 permanent cpu-threads 

 mosaic=1 - compile Darknet with OpenCV for using mosaic=1

What am I missing?

Upvotes: 6

Answers (6)

Priyanka Choudhary

Reputation: 1

See your resource utilization when you start training and see if the RAM causes it to exceed.

If it is so then try this solution:

CFG-Parameters in the [net] section:

[net] section

batch - number of samples (images, letters, ...) which will be precossed in one batch

subdivisions- number of mini_batches in one batch, size mini_batch = batch/subdivisions,so GPU processes mini_batch samples at once, and the weights will be updated for batch samples (1 iteration processes batch images)

With reference from this, I tried various combinations for min_batch size like:

batch=64, subdivisions=8

"OR"

batch=64, subdivisions=16

and so on...

I found that my colab is working only for min_batch=2 So I consider the subdivisions half of the batch like:

batch=64
subdivisions=32

"OR"

batch=32
subdivisions=16

Or any other...

And it also raises an error when I use

batch=1, subdivisions=1

Upvotes: 0

Gil

Reputation: 69

use this to enable use of opencv:

$ git clone https://github.com/AlexeyAB/darknet.git
$ cd darknet
$ sed -i 's/OPENCV=0/OPENCV=1/' Makefile

https://github.com/AlexeyAB/darknet/blob/master/Makefile#L4

Upvotes: 0

Zilong Li

Reputation: 41

my friend, l just solved this problem right now. l think i have find the reason here. If your train/test.txt are empty, this is the rreason. you open"creating-train-and-test-txt-files.py" and edit it. Find the keyword is jpeg place. we could find only 2 jpeg words here and you edit them into "jpg" and replace this in your Google Drive. Finally, restart the colaboratory work. And your training will not quit for "608 x 608 Create 64 permanent cpu-threads ".

Best wishes from China.

Upvotes: 4

Siddharth Sankhe

Reputation: 23

The above error is caused mainly due to empty train.txt and test.txt files. Please check these two files

Upvotes: 0

Zabir Al Nazi Nabil

Reputation: 11208

How have you installed OpenCV?

For a simple fix, you can try this sudo apt install libopencv-dev python3-opencv

Also make sure you have cmake,

sudo apt install cmake

This should install opencv 3.2 and cmake 3.10 in your system. Then try running darknet.

Finally, change the Makefile,

OPENCV=1

Upvotes: 0

Rômulo Férrer Filho

Reputation: 129

If you want to use OpenCV you need to re-compile Darknet, but first change the make file to the following:

 OPENCV=1

If you don't need OpenCV then do as @TaQuangTu sugested. When you fix this line just run the build.sh script again and it should work just fine.

I'd also suggest changing the following lines if you intent to train using a GPU

GPU=1
CUDNN=1
CUDNN_HALF=1

Upvotes: 7

Yolo not starting to train

Answers (6)

Related Questions