Reputation: 1
I am trying to create an index with indexer and skillset in Azure AI Search with a datasource connected to adlsgen2. Going through the import data wizard works fine, and creates the index, indexer, and skillset with a few skills such as OCR and merge.
The problem arises when I add a vector field. I add it with the following config:
Collection(Edm.Single) Retrievable Dimensions: 1536 Create the vectorizer and algorithm without changing the config from the default.
Simply adding this field to the index causes the index to always return empty. The indexer runs and says it is successful, but the index will be empty.
As mentioned, running everything the same but without that field is fine and the index is populated with information. I have not yet mapped the vector field to anything, it is just there.
There is a warning on the indexer run that may have something to do with it: This ADLS Gen2 indexer maps the property 'metadata_storage_path' to the index key, which may not reindex documents if directories are renamed. Update the 'LastModified' timestamps for all the blobs in the directory to ensure they get reindexed.
Is this a known bug?
Upvotes: 0
Views: 3987
Reputation: 211
The way I experienced AI search, you get 2 options, Import data and import data and vectorize, In first option, it is necessary to have a vectorized field in data source, and then map it to vector field (collection(Edm.Single)) of the index. This options needs an external vectorizer code and populate index using push pull API mechanism. (https://learn.microsoft.com/en-us/azure/search/vector-search-how-to-create-index?tabs=config-2023-11-01%2Crest-2023-11-01%2Cpull%2Cportal-check-index) In Second option, you can vectorize data automatically, only thing it gives blob storage as only option. meaning you can onlyvectorize .json, pdf etc physicall files stored in containers.
Upvotes: 0
Reputation: 41
Hmm, I can't repro the issue. I used the Import data wizard to index 4 PDFs from adls gen2, and then added a vector field, type collection(edm.string), dimensions 1536, with a vector profile (this step is required), but no vectorizer (it's optional). Reset and reran the indexer, and I got the same content the second time around. Having an empty vector field in the index didn't break the adls gen2 indexer for me or generate an empty index.
Upvotes: 0