Sener
Sener

Reputation: 335

How to select image sets for object detection and object tracking?

I have to count eggs on a conveyor belt. The eggs can be seen in various ways.

Small, large, odd-shaped, dirty, cracked, broken, broken empty inside, broken with liquid inside, next to chicken feather, many eggs touching to each others, even some eggs might be sitting on top of some group of other eggs.

My challenge is to count the eggs with utmost accuracy. Besides, if possible, to classify/count the abnormal eggs as I mentioned above.

I have already some solution running on Jetson Nano. It counts the eggs by finding counters against relatively dark background (background subtraction). It does a reasonable good job at some degree although it is slow.

My question(s);

Now, I want to do this more with deep-learning models using object detection and object tracking in a single algorithm together. This effort still can be considered as experiment for me.

First thing first, I need to have some image sets so I need some advises on that.

The eggs will always be on a conveyor belt coming along together with very similar type of them (in terms of color, shape and size).

What I am not sure where/how to take shots. Do I have to take the shots with the object's natural environment/background? And, how? should I take shot of each possible egg appearances as I listed above by putting them on the conveyor belt one by one and also changing their orientations a bit each time?

Or Should I take each shot again one by one on with a white background and again and also changing their orientations a bit each time?

A sample appearance from the conveyor belt:

enter image description here

Upvotes: 2

Views: 479

Answers (1)

Zabir Al Nazi Nabil
Zabir Al Nazi Nabil

Reputation: 11218

  1. First create a dataset of the eggs with bounding boxes (around 1k, then use augmentation to make a few more thousands). You can use the following tools for annotating:

https://github.com/developer0hye/Yolo_Label (works great, but only for windows)

https://github.com/AlexeyAB/Yolo_mark

https://github.com/heartexlabs/label-studio (this is a more complex annotation tool for many other tasks)

  1. As this is an embedded solution, and the eggs will be moving you need something with a good FPS and also a reasonable mAP. From my experiments with many object detection models, I have found yolov3 (darknet) to be the most stable.

I would suggest to go with darknet YOLO, which is written in C++, you wouldn't need to write any major code, it will be fast and accurate.

https://pjreddie.com/darknet/yolo/

Use this repo if you're on Linux https://github.com/pjreddie/darknet

Use this one if you're on Windows https://github.com/AlexeyAB/darknet

  1. Before training, you need to find the optimal anchor size for your dataset. I wrote a simple k-means to find the anchor size in any yolo-compatible dataset.

https://github.com/zabir-nabil/yolov3-anchor-clustering

  1. I did some minor customization (like sending OpenCV/numpy arrays directly to model) to run the darknet python API faster on a server (tensorflow model server with both REST and gRPC). I also wrote a flask server for it. You can find it here -

https://github.com/zabir-nabil/tf-model-server4-yolov3

About image resolution: The default dimension for yolov3 is (416, 416) which should be enough for your case. So, you should take images with the same/similar camera that you'll use in the actual development environment. A PI camera should be enough, you can use better cameras too but in the end you have to resize all of them to (416,416) dimension.

This is a two class problem, so for positive class you need slightly more images. Here's a rough estimate how you can generate the samples. Let's say the range of your egg counting model will be 0 - 25.

20% images with 0 eggs The rest 80% will be form a somewhat uniform or flat gaussian distribution meaning, if 80% == 1000 images, the count of images with 1 egg will be 1000/(range) = 1000/(25-1+1) = 1000/25 = 40, same for others (2-25).

For brightness, contrast, lighting, you should just go with the one which will be very close to the actual deployment scenario, the augmentation will take care of the rest. Yolov3 is very robust, so you don't need to worry about the background noise too much.

There is no major difference if you are using different image formats, usually .jpg will give you a small file size, so easy for the storage.

Upvotes: 3

Related Questions