Reputation: 121
I am creating a Deep Learning program that is capable to identifying a hand digit gesture. I have completed training the model and now I need to use it on a live video. So I am trying to create an openCV program where the user will place his/her hand in a region of interest( a box ) in a frame and that ROI will be fed to my CNN MODEL. Based on the gesture my CNN model will reply.
Wrote this code where I managed to create a 300x300 square ( my ROI ), but how do I use that region of interest to feed it to my CNN model? I want only that square part to be as input to my model.
import traceback
import cv2
import numpy as np
import math
cam = cv2.VideoCapture(0)
while(1):
try:
ret, frame = cam.read()
frame = cv2.flip(frame,1)
cv2.rectangle(frame,(200,100),(500,400),(0,255,0),2)
cv2.imshow('curFrame',frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
except Exception:
traceback.print_exc()
pass
cam.release()
cv2.destroyAllWindows()
** Addidional
ROI = frame[100:200 , 100:200]
what does that line mean?
Upvotes: 5
Views: 5396
Reputation: 11420
It is actually quite simple to create a ROI from a frame, and basically you already wrote it there at the end (ROI = frame[100:200 , 100:200]
).
Lets assume this is your hand with your ROI after you did the code above (image from the internet):
Now if you want what it is inside the ROI as another image, you can use:
ROI = frame[100:400, 200:500] # according to the coordinates of your rectangle
However this will result in also having the rectangle visible in the image (see image below), so you need to actually create a copy from the original image.
Here is what it looks without copying from the original:
Also, some algorithms behave kind of weird with this numpy slice view, so better to do a copy. The the code should look like this in the end:
import cv2
import numpy as np
cam = cv2.VideoCapture(0)
if not cam.isOpened():
print ("Could not open cam")
exit()
while(1):
ret, frame = cam.read()
if ret:
frame = cv2.flip(frame,1)
display = cv2.rectangle(frame.copy(),(200,100),(500,400),(0,255,0),2)
cv2.imshow('curFrame',display)
ROI = frame[100:400, 200:500].copy()
cv2.imshow('Current Roi', ROI)
if cv2.waitKey(10) & 0xFF == ord('q'):
break
cam.release()
cv2.destroyAllWindows()
Notice that I add the checks for cam is open and the ret. Which is will tell you if there is any problem opening the webcam or if the image could not be read.
This will be the resulting image in ROI:
This can be saved with cv2.imwrite
or passed to any other algorithm you have. If you have any questions feel free to ask.
Upvotes: 5