Reputation: 459
I want to use some kind of machine learning to measure (the diameter) and count potatoes passing on belt. I've started of with opencv and have started to train a cascade with positive and negative images, this is a print screen from a short video I've captured:
In this frame it only manage to identify some potatoes but I assume that I can add more positive and negative images to make a better identifier.
So to the questions,
Do you think that I'm on right track?
And any idea how I shall continue with measuring the diameter?
How shall I keep track each potato so it is only counted once?
(This is the code I run to identify the potatoes)
import numpy as np
import cv2
pot_cascade = cv2.CascadeClassifier('cascade.xml')
cap = cv2.VideoCapture('potatoe_video.mp4')
while 1:
ret, img = cap.read()
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
potatoes = pot_cascade.detectMultiScale(gray, 5, 5)
for (x, y, w, h) in potatoes:
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 255, 0), 2)
cv2.imshow('img', img)
k = cv2.waitKey(30) & 0xff
if k == 27:
break
cap.release()
cv2.destroyAllWindows()
Upvotes: 2
Views: 589
Reputation: 10869
Do you think that I'm on right track?
Yes and no. Machine learning is definitely the right tool for the problem of tracking potatoes but the shown results so far are not very good at all. A long way still to go.
And any idea how I shall continue with measuring the diameter?
If you are only interested in measuring the average diameter of the potatoes then this might be much easier than really tracking the single potatoes.
First you need to estimate the perspective (position of the transport belt relative to the camera), so you know how the image space relates to the real space on the transport belt. What may help you in finding out the real-space to image-space transformation is putting a regular grid (a sheet with regular black dots for example) on the belt and observe its movement. Mounting the camera in top-view mode may make things much simpler there.
Measuring only the average diameter means you do not need to get every potato right, a standard edge detection algorithm might suffice to detect most of the potatoes and then fitting ellipses to them, making a histogram of the diameters and looking at the median value.
Alternatively calculate an auto-correlation of the luminosity of the potatoes (between the potatoes it's dark, potatoes itself are reflecting the light) and fit the width of peaks in the auto-correlation. Or calculate the Fourier transform of the images and normalize the average amplitude within a frequency band where you expect the size of the potatoes with the average amplitude of another frequency band. This works best if you can calibrate the method with potatoes of known size. That means you may be able to record potatoes of known different size and thereby calibrate auto-correlation or FFT based measures.
I would though go for the machine learning approach because it potentially gives you single potato measurements.
How shall I keep track each potato so it is only counted once?
Every potato is (potentially) visible for the time range in the video that is needed to transport the potato on the belt through the field of view. Once you have established the perspective, you can relate the position in the image and the recording time to a position on the transport belt and as long as a potato doesn't change position on the belt, you can easily identify it over the whole time range where it's visible.
You basically must know (estimate) the movement of the transport belt and then you can "undo" it (computationally) and average over all the incidences where you saw the same potato.
I recommend the following general workflow:
Upvotes: 1