James Parker
James Parker

Reputation: 2242

Searching OpenCV ORB descriptors using ElasticSearch

I'm storing ORB Descriptors in an ElasticSearch Vector Field and then performing kNN Searches using the new API in ElasticSearch 8.0+.


# Read the image in via URL and convert to Gray
resp = urllib.request.urlopen(URL)
image = np.asarray(bytearray(resp.read()), dtype="uint8")
image = cv2.imdecode(image, cv2.COLOR_BGR2GRAY)

# Only looking for 16 features since ElasticSearch
# will not let us index a vector larger then 1024
orb = cv2.ORB_create(nfeatures=16)
kp, des = orb.detectAndCompute(image, None)

# Flatten for saving in ElasticSearch
dsc = des.flatten()

# Finally index it in ElasticSearch

This is my mapping for ElasticSearch

mappings: {
   dynamic: 'true',
   properties: {
      image_dense_vector: {
        type: 'dense_vector',
        dims: 1024,
        index: true,
        similarity: 'cosine'
      }
    }
}

And finally this is my search query.

body = {
  "field":"image_dense_vector",
  "query_vector":dense_vector,
  "k":20,
  "num_candidates":10000
}
res = self.es.knn_search(index=self.index, knn=body)

The data set consists of ~34,000 records.

This will return results if the image passed in is an exact match. But if the image is off even slightly the results that come back are not even close to accurate.

Any suggestions?

Upvotes: 1

Views: 165

Answers (1)

Christoph Rackwitz
Christoph Rackwitz

Reputation: 15506

First, ponder the ORB paper.

ORB uses the BRIEF descriptor. BRIEF emits binary vectors.

detectAndCompute may give you an array of uint8 but those byte values aren't scalars. Those bytes merely hold the bits.

For binary vectors, you need to use the Hamming distance. "Cosine similarity" doesn't work. Not unless you blow your data up and interpret the 32*8=256 bits as scalars.

Upvotes: 0

Related Questions