Sairaj
Sairaj

Reputation: 199

Lucene index field value is stripped of all html tags

I have a Lucene index in which one of the fields is mapped to Sitecore's rich text field.

Since this field value contains html content for most of the items sharing the template, I expected html content to be returned when fetching the item's field value. However, I noticed that the value returned is stripped of all html tags.

I tried changing the INDEXTYPE to "UNTOKENTIZED". Yet this did not solve the problem. I understand that Lucene does this to allow searching based on that field. But that is not a requirement in my case and I want this behavior overridden.

Upvotes: 5

Views: 778

Answers (2)

Dan Tuohy
Dan Tuohy

Reputation: 1

You should be able to create a computed index field and this should save the HTML correctly in the index.

public class TileHtml : IComputedIndexField
{
    public object ComputeFieldValue(IIndexable indexable)
    {
        Item indexedContent = indexable as SitecoreIndexableItem;

        if (indexedContent != null && indexedContent.Fields[ITileConstants.TileHtmlFieldName] != null && !string.IsNullOrWhiteSpace(indexedContent.Fields[ITileConstants.TileHtmlFieldName].Value))
        {
            return indexedContent.Fields[ITileConstants.TileHtmlFieldName].Value;
        }

        return null;
    }

    public string FieldName { get; set; }
    public string ReturnType { get; set; }
}

You can then register the filed in your Lucene Index

<fields hint="raw:AddComputedIndexField">
<field fieldName="TileHtml" storageType="YES" indexType="TOKENIZED">Namespace.TileHtml, Assembly</field>

Upvotes: 0

Marek Musielak
Marek Musielak

Reputation: 27142

It happens because there is a RichTextFieldReader assigned to the html and rich text fields:

<fieldReader 
    fieldTypeName="html|rich text"                                     
    fieldNameFormat="{0}"
    fieldReaderType="Sitecore.ContentSearch.FieldReaders.RichTextFieldReader, Sitecore.ContentSearch" />

In Sitecore 8.1 it's defined in Sitecore.ContentSearch.Lucene.DefaultIndexConfiguration.config.

It strips out all the tags using HtmlField.GetPlainText().

You can try to add another section at the same level as <mapFieldByTypeName hint="raw:AddFieldReaderByFieldTypeName"> section and use something like:

<mapFieldByFieldName hint="AddFieldReaderByFieldName">
    <fieldReader 
        fieldName="yourFieldName"
        fieldReaderType="Sitecore.ContentSearch.FieldReaders.DefaultFieldReader, Sitecore.ContentSearch" />

Mapping by fieldName has higher priority than mapping by field type, so it will use fieldRendered specified for your field instead of using the one specified for the type of your field.

Upvotes: 5

Related Questions