Reputation: 2242
I'm storing ORB Descriptors in an ElasticSearch Vector Field and then performing kNN Searches using the new API in ElasticSearch 8.0+.
# Read the image in via URL and convert to Gray
resp = urllib.request.urlopen(URL)
image = np.asarray(bytearray(resp.read()), dtype="uint8")
image = cv2.imdecode(image, cv2.COLOR_BGR2GRAY)
# Only looking for 16 features since ElasticSearch
# will not let us index a vector larger then 1024
orb = cv2.ORB_create(nfeatures=16)
kp, des = orb.detectAndCompute(image, None)
# Flatten for saving in ElasticSearch
dsc = des.flatten()
# Finally index it in ElasticSearch
This is my mapping for ElasticSearch
mappings: {
dynamic: 'true',
properties: {
image_dense_vector: {
type: 'dense_vector',
dims: 1024,
index: true,
similarity: 'cosine'
}
}
}
And finally this is my search query.
body = {
"field":"image_dense_vector",
"query_vector":dense_vector,
"k":20,
"num_candidates":10000
}
res = self.es.knn_search(index=self.index, knn=body)
The data set consists of ~34,000 records.
This will return results if the image passed in is an exact match. But if the image is off even slightly the results that come back are not even close to accurate.
Any suggestions?
Upvotes: 1
Views: 165
Reputation: 15506
First, ponder the ORB paper.
ORB uses the BRIEF descriptor. BRIEF emits binary vectors.
detectAndCompute
may give you an array of uint8
but those byte values aren't scalars. Those bytes merely hold the bits.
For binary vectors, you need to use the Hamming distance. "Cosine similarity" doesn't work. Not unless you blow your data up and interpret the 32*8=256 bits as scalars.
Upvotes: 0