Is it possible to get the count of objects using Google's Vision API or Amazon's Rekognition?

I have been exploring to get the count of the objects in an image / video using AWS Rekognition & Google's Vision, but haven't been able to find a way out. Though at Google's Vision site, they do have a section 'Insight from the Images' where apparently it seems like that the quantity has been captured.

Attached is a snapshot from that URL.

Can someone please suggest if it is possible with Google's Vision or any other API which can help in getting the count of objects in an image. Thanks

Edit:

For example - For the image shown below, the count returned should be 10 cars. As Torry Yang suggested in his answer, the label Annotations count can give the required number but it does not seem to be the case as the count for label annotations is 18. The returned object is somewhat like this.

"labelAnnotations": [ { "mid": "/m/0k4j", "description": "car", "score": 0.98658943, "topicality": 0.98658943 }, { "mid": "/m/012f08", "description": "motor vehicle", "score": 0.9631113, "topicality": 0.9631113 }, { "mid": "/m/07yv9", "description": "vehicle", "score": 0.9223521, "topicality": 0.9223521 }, { "mid": "/m/01w71f", "description": "personal luxury car", "score": 0.8976857, "topicality": 0.8976857 }, { "mid": "/m/068mqj", "description": "automotive design", "score": 0.8736646, "topicality": 0.8736646 }, { "mid": "/m/012mq4", "description": "sports car", "score": 0.8418799, "topicality": 0.8418799 }, { "mid": "/m/01lcwm", "description": "luxury vehicle", "score": 0.7761523, "topicality": 0.7761523 }, { "mid": "/m/06j11d", "description": "performance car", "score": 0.76816446, "topicality": 0.76816446 }, { "mid": "/m/03vnt4", "description": "mid size car", "score": 0.75732976, "topicality": 0.75732976 }, { "mid": "/m/03vntj", "description": "full size car", "score": 0.6855145, "topicality": 0.6855145 }, { "mid": "/m/0h8ls87", "description": "automotive exterior", "score": 0.66056395, "topicality": 0.66056395 }, { "mid": "/m/014f__", "description": "supercar", "score": 0.592226, "topicality": 0.592226 }, { "mid": "/m/02swz_", "description": "compact car", "score": 0.5807265, "topicality": 0.5807265 }, { "mid": "/m/0h6dlrc", "description": "bmw", "score": 0.5801241, "topicality": 0.5801241 }, { "mid": "/m/01h80k", "description": "muscle car", "score": 0.55745816, "topicality": 0.55745816 }, { "mid": "/m/021mp2", "description": "sedan", "score": 0.5522745, "topicality": 0.5522745 }, { "mid": "/m/0369ss", "description": "city car", "score": 0.52938646, "topicality": 0.52938646 }, { "mid": "/m/01d1dj", "description": "coupé", "score": 0.50642073, "topicality": 0.50642073 } ]

Upvotes: 3

Answers (2)

Mausam Sharma

Reputation: 892

Neither Google vision nor AWS Rekognition supports object counting in a photograph.

https://forums.aws.amazon.com/thread.jspa?threadID=254814

However , you can count number of faces in an image in both Vision and Rekognition.

In AWS Rekognition , you get a response for DetectFaces API as json :

HTTP/1.1 200 OK
Content-Type: application/x-amz-json-1.1
Date: Wed, 04 Jan 2017 23:37:03 GMT
x-amzn-RequestId: b1827570-d2d6-11e6-a51e-73b99a9bb0b9
Content-Length: 1355
Connection: keep-alive


{
   "FaceDetails":[
      {
         "BoundingBox":{
            "Height":0.18000000715255737,
            "Left":0.5555555820465088,
            "Top":0.33666667342185974,
            "Width":0.23999999463558197
         },
         "Confidence":100.0,
         "Landmarks":[
            {
               "Type":"eyeLeft",
               "X":0.6394737362861633,
               "Y":0.40819624066352844
            },
            {
               "Type":"eyeRight",
               "X":0.7266660928726196,
               "Y":0.41039225459098816
            },
            {
               "Type":"nose",
               "X":0.6912462115287781,
               "Y":0.44240960478782654
            },
            {
               "Type":"mouthLeft",
               "X":0.6306198239326477,
               "Y":0.46700039505958557
            },
            {
               "Type":"mouthRight",
               "X":0.7215608954429626,
               "Y":0.47114261984825134
            }
         ],
         "Pose":{
            "Pitch":4.050806522369385,
            "Roll":0.9950747489929199,
            "Yaw":13.693790435791016
         },
         "Quality":{
            "Brightness":37.60169982910156,
            "Sharpness":80.0
         }
      },
      {
         "BoundingBox":{
            "Height":0.16555555164813995,
            "Left":0.3096296191215515,
            "Top":0.7066666483879089,
            "Width":0.22074073553085327
         },
         "Confidence":99.99998474121094,
         "Landmarks":[
            {
               "Type":"eyeLeft",
               "X":0.3767718970775604,
               "Y":0.7863991856575012
            },
            {
               "Type":"eyeRight",
               "X":0.4517287313938141,
               "Y":0.7715709209442139
            },
            {
               "Type":"nose",
               "X":0.42001065611839294,
               "Y":0.8192070126533508
            },
            {
               "Type":"mouthLeft",
               "X":0.3915625810623169,
               "Y":0.8374140858650208
            },
            {
               "Type":"mouthRight",
               "X":0.46825936436653137,
               "Y":0.823401689529419
            }
         ],
         "Pose":{
            "Pitch":-16.320178985595703,
            "Roll":-15.097439765930176,
            "Yaw":-5.771541118621826
         },
         "Quality":{
            "Brightness":31.440860748291016,
            "Sharpness":60.000003814697266
         }
      }
   ],
   "OrientationCorrection":"ROTATE_0"
}

You can then use this response to count number of bounding boxes which will ultimately correspond to number of faces in the image.

Also, if you want to count objects in photograph, you can set up a custom machine learning model to do so on AWS SageMaker. example : https://github.com/cosmincatalin/object-counting-with-mxnet-and-sagemaker

Upvotes: 0

Torry Yang

Reputation: 375

On Google Cloud Vision, you should be able to get a count. For example if you want to count the number of faces with Python you can do this:

def detect_faces(path):
    """Detects faces in an image."""
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.face_detection(image=image)
    faces = response.face_annotations
    print(len(faces))

Note the last line. In every supported language, you should be able to count the results.

Here is how you'd get a count for each label.

def detect_labels(path):
    """Detects labels in the file."""
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.types.Image(content=content)

    response = client.label_detection(image=image)
    labels = response.label_annotations

    count = {}
    for label in labels:
        if label in count:
            count[label] += 1
        else:
            count[label] = 1

In this second example, count would be a dictionary of each label and how many times it shows up in the image.

Upvotes: 2

Is it possible to get the count of objects using Google&#39;s Vision API or Amazon&#39;s Rekognition?

Answers (2)

Related Questions

Is it possible to get the count of objects using Google's Vision API or Amazon's Rekognition?