Reputation: 125
I am trying to train Yolo on a custom dataset and everything seems to be working without errors but it just isn't training.
I followed the tutorial on https://github.com/AlexeyAB/darknet twice but I get the same results
./darknet detector train data/obj.data cfg/yolo-obj.cfg yolov4.conv.137
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Total BFLOPS 59.563
avg_outputs = 489778
Loading weights from yolov4.conv.137...
seen 64, trained: 0 K-images (0 Kilo-batches_64)
Done! Loaded 137 layers from weights-file
Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005
Resizing, random_coef = 1.40
608 x 608
Create 64 permanent cpu-threads
mosaic=1 - compile Darknet with OpenCV for using mosaic=1
I also tried without the pre-trained weights but this doesn't start the training process either
./darknet detector train data/obj.data cfg/yolo-obj.cfg
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Total BFLOPS 59.563
avg_outputs = 489778
Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005
Resizing, random_coef = 1.40
608 x 608
Create 64 permanent cpu-threads
mosaic=1 - compile Darknet with OpenCV for using mosaic=1
What am I missing?
Upvotes: 6
Views: 12594
Reputation: 1
See your resource utilization when you start training and see if the RAM causes it to exceed.
If it is so then try this solution:
CFG-Parameters in the [net] section:
[net] section
batch - number of samples (images, letters, ...) which will be precossed in one batch
subdivisions- number of mini_batches in one batch, size mini_batch = batch/subdivisions,so GPU processes mini_batch samples at once, and the weights will be updated for batch samples (1 iteration processes batch images)
With reference from this, I tried various combinations for min_batch size like:
batch=64, subdivisions=8
"OR"
batch=64, subdivisions=16
and so on...
I found that my colab is working only for min_batch=2 So I consider the subdivisions half of the batch like:
batch=64
subdivisions=32
"OR"
batch=32
subdivisions=16
Or any other...
And it also raises an error when I use
batch=1, subdivisions=1
Upvotes: 0
Reputation: 69
use this to enable use of opencv:
$ git clone https://github.com/AlexeyAB/darknet.git
$ cd darknet
$ sed -i 's/OPENCV=0/OPENCV=1/' Makefile
https://github.com/AlexeyAB/darknet/blob/master/Makefile#L4
Upvotes: 0
Reputation: 41
my friend, l just solved this problem right now. l think i have find the reason here. If your train/test.txt are empty, this is the rreason. you open"creating-train-and-test-txt-files.py" and edit it. Find the keyword is jpeg place. we could find only 2 jpeg words here and you edit them into "jpg" and replace this in your Google Drive. Finally, restart the colaboratory work. And your training will not quit for "608 x 608 Create 64 permanent cpu-threads ".
Best wishes from China.
Upvotes: 4
Reputation: 23
The above error is caused mainly due to empty train.txt
and test.txt
files. Please check these two files
Upvotes: 0
Reputation: 11208
How have you installed OpenCV?
For a simple fix, you can try this sudo apt install libopencv-dev python3-opencv
Also make sure you have cmake,
sudo apt install cmake
This should install opencv 3.2 and cmake 3.10 in your system. Then try running darknet.
Finally, change the Makefile,
OPENCV=1
Upvotes: 0
Reputation: 129
If you want to use OpenCV you need to re-compile Darknet, but first change the make file to the following:
OPENCV=1
If you don't need OpenCV then do as @TaQuangTu sugested. When you fix this line just run the build.sh script again and it should work just fine.
I'd also suggest changing the following lines if you intent to train using a GPU
GPU=1
CUDNN=1
CUDNN_HALF=1
Upvotes: 7