Steve Drake
Steve Drake

Reputation: 2048

Azure Search - AzureSearch_SkipContent

I have some very large blobs, so I set AzureSearch_SkipContent on the blob row with the following code :

if (b.Properties.Length >= 134217728)
{
    b.Metadata["AzureSearch_SkipContent"] = "true";
    await b.SetMetadataAsync();
}

But when I review the warning and errors I can see that the indexer has attempted to index the content even though I have asked it to skip, the error I see is (this is under errors, so I guess it's not going to index anything for this blob) :

{
    "key": null,
    "errorMessage": "The blob '113443f46d1b184650bf4b0d5b0b3806055c43558a676b778de13f1b7ef4da93' has the size of 218285352 bytes, which exceeds the maximum size for document extraction for your current service tier."
},

If I look at this blob in storage explorer I see

enter image description here

Upvotes: 1

Views: 263

Answers (2)

Eugene Shvets
Eugene Shvets

Reputation: 4671

UPDATE Jan 3, 2018

To make this scenario work gracefully, we are adding indexStorageMetadataOnlyForOversizedDocuments indexer configuration setting. It takes a bool value and is false by default, so set it to true in the indexer configuration to enable it. This is fresh off the presses and will be deployed in production worldwide by January 19.

ORIGINAL RESPONSE

Both "true" and "True" are valid values of AzureSearch_SkipContent. The problem is that AzureSearch_SkipContent does not mean that the blob content is ignored.

Blob content contributes in two ways:

  1. Metadata like author, date modified, etc.
  2. Text content of the document.

AzureSearch_SkipContent means that Azure Search only performs #1 and not #2, but the blob still needs to be downloaded, so blob size quota comes into play.

Currently, the only other per-blob processing option is AzureSearch_Skip, which completely skips the blob. You can also use MaxFailedItems / MaxFailedItemsPerBatch to a specific number of errors, as described in Dealing with errors.

I think what would be really useful for this situation is the ability for Azure Search to automatically extract only the storage metadata for large blobs, without you having to process all of your blobs individually. Please feel free to add a suggestion for this on our User Voice site.

Upvotes: 1

Jeremy Hutchinson
Jeremy Hutchinson

Reputation: 2045

It needs to a capital T in true

if (b.Properties.Length >= 134217728)
{
    b.Metadata["AzureSearch_SkipContent"] = "True";
    await b.SetMetadataAsync();
}

When in doubt use the literal and convert to string

b.Metadata["AzureSearch_SkipContent"] = true.ToString();

or

bool skipIndex = true;
b.Metadata["AzureSearch_SkipContent"] = skipIndex.ToString();

Upvotes: 1

Related Questions