Jagjit Singh
Jagjit Singh

Reputation: 751

Elasticsearch - Trying to index MS Word attachment & making a full text search within

As the title is already indicating, I am trying to index MS Word documents and making a full text search within.

I have seen several examples, but I am not able to figure out what I am doing incorrectly.

Relevant Code:

[ElasticsearchType(Name = "AttachmentDocuments")]
public class Attachment
{
    [String(Name = "_content")]
    public string Content { get; set; }
    [String(Name = "_content_type")]
    public string ContentType { get; set; }
    [String(Name = "_name")]
    public string Name { get; set; }

    public Attachment(Task<File> file)
    {
        Content = file.Result.FileContent;
        ContentType = file.Result.FileType;
        Name = file.Result.FileName;
    }
}

The "Content" property above is set to "file.Result.FileContent" in the constructor. The "Content" property is a base64 string.

public class Document
{
    [Number(Name = "Id")]
    public int Id { get; set; }
    [Attachment]
    public Attachment File { get; set; }
    public String Title { get; set; }
}

Below is the method for indexing documents to elasticsearch database.

    public void IndexDocument(Attachment attachmentDocument)
    {
        // Create the index if it does not already exist
        var indexExists = _client.IndexExists(new IndexExistsRequest(ElasticsearchIndexName));
        if (!indexExists.Exists)
        {
            var indexDescriptor =
                new CreateIndexDescriptor(new IndexName {Name = ElasticsearchIndexName}).Mappings(
                    ms => ms.Map<Document>(m => m.AutoMap()));
            _client.CreateIndex(indexDescriptor);
        }

        var doc = new Document()
        {
            Id = 1,
            Title = "Test",
            File = attachmentDocument
        };

        _client.Index(doc);
    }

Based on the code above, the document get indexed into the correct index(Screenshot from Elasticsearch host - Searchly):

Searchly Screenshot

The content in the file is : "VCXCVXCVXCVXCVXVXCVXCV" and with the following query I get zero hits in return:

        QueryContainer queryContainer = null;
        queryContainer |= new MatchQuery()
        {
            Field = "file",
            Query = "VCXCVXCVXCVXCVXVXCVXCV"
        };

        var searchResult =
            await _client.LowLevel.SearchAsync<string>(ApplicationsIndexName, "document", new SearchRequest()
            {
                From = 0,
                Size = 10,
                Query = queryContainer, 
                Aggregations = GetAggregations()
            });

I would appericiate if someone could hint me what I am doing incorrectly or should look into?

Providing screenshot of mapping in my Elasticsearch database:

Elasticsearch - Mapping

Upvotes: 0

Views: 189

Answers (1)

Vova Bilyachat
Vova Bilyachat

Reputation: 19494

Because you refer to wrong field. Field should be file.content

 queryContainer |= new MatchQuery()
        {
            Field = "file.content",
            Query = "VCXCVXCVXCVXCVXVXCVXCV"
        };

Upvotes: 1

Related Questions