How can I skip indexing a specific property, but still retrieve its contents when querying in Elasticsearch and NEST?

Given this index-model class:

public class ProductIndexModel
{
    public Guid Id { get; set; }
    public DateTime Created { get; set; }
    public string Name { get; set; }
    public JObject DynamicContent { get; set; }
}

I am struggling to do the following:

My reason for not indexing DynamicContent property is, that it's a json-blob, where occasionally there will be clashes between property-paths, that are of different types (e.g. object vs string, int vs string, and so on). For example, trying to index path /dynamiccontent.id with 2 objects, where value is respectively of type int and string might give me:

error: Type: mapper_parsing_exception Reason: "failed to parse [dynamiccontent.id]" CausedBy: Type: json_parse_exception Reason: "Current token (START_OBJECT) not numeric, can not use numeric value accessors

I create the index in this way:

var createIndexResponse = await _elasticClient.CreateIndexAsync(indexName, c => c
    .InitializeUsing(settingsState)
    .Mappings(ms => ms
        .Map<ProductIndexModel>(m => m
            .AutoMap()
        )
    )
);

Where settingsState is of type Nest.IndexState with some tokenizers and more, that is irrelevant to the question.

ISearchRequest SearchRequest(SearchDescriptor<ProductIndexModel> x) => x
    .Index(indexName)
    .Query(q => q.Bool(bq => bq.Filter(filters)))
    .From(filter.Offset)
    .Size(filter.PageSize)
;

var searchResponse = await _elasticClient.SearchAsync<ProductIndexModel>(SearchRequest);

Where filters is a dynamically constructed generic list of filters to reduce results by.

So I want to keep DynamicContent un-indexed, but still be able to get its (raw) contents when querying.

I have tried to annotate DynamicContent with the Nest.IgnoreAttribute which leaves it out entirely, thus resulting in a null value when retrieving. Any suggestions as to how to just "store" the value, but not index it, using NEST?

Upvotes: 1

Views: 854

Answers (1)

Russ Cam
Russ Cam

Reputation: 125498

Since DynamicContent is a Json.NET JObject type, if you're using NEST 6.x, you will need to hook up the JsonNetSerializer to be able to correctly index an instance of JObject.

Once this serializer is hooked up, you can attribute the model with [Object(Enabled = false)], which sets enabled=false for the field, meaning the property is persisted in _source but not parsed or indexed.

With JObject in particular, NEST's automapping (which is needed to take the attribute into account when mapping) will generate a large "properties" object for JObject which is wholly unneccessary, since the field will not be parsed or indexed. In this particular case, fluent mapping would be a better choice than attribute mapping. Here's an example:

private static void Main()
{
    var defaultIndex = "default_index";
    var pool = new SingleNodeConnectionPool(new Uri("http://localhost:9200"));

    var settings = new ConnectionSettings(pool, JsonNetSerializer.Default)
        .DefaultIndex(defaultIndex)
        .DefaultTypeName("_doc");

    var client = new ElasticClient(settings);

    if (client.IndexExists(defaultIndex).Exists)
        client.DeleteIndex(defaultIndex);

    var createIndexResponse = client.CreateIndex(defaultIndex, c => c
        .Mappings(m => m
            .Map<ProductIndexModel>(mm => mm
                .AutoMap() // <-- automap
                .Properties(p => p
                    .Object<JObject>(o => o
                        .Name(n => n.DynamicContent) <-- override the automap inferred mapping for DynamicContent
                        .Enabled(false)
                    )
                )
            )
        )
    );

    var indexResponse = client.Index(new ProductIndexModel
    {
        Id = Guid.NewGuid(),
        Created = DateTime.UtcNow,
        Name = "foo",
        DynamicContent = new JObject 
        {
            { "prop1", "value1" },
            { "prop2", new JArray(1, 2, 3, 4) }
        }
    }, i => i.Refresh(Refresh.WaitFor));

    var searchResponse = client.Search<ProductIndexModel>(s => s
        .MatchAll()
    );
}

public class ProductIndexModel
{
    public Guid Id { get; set; }
    public DateTime Created { get; set; }
    public string Name { get; set; }
    [Object(Enabled = false)]
    public JObject DynamicContent { get; set; }
}

Upvotes: 3

Related Questions