FAlonso
FAlonso

Reputation: 494

ElasticSearch return only specific part of the document

I have a JSON document mimicking the following structure.

{
"mydata": [
      {
        "Key1": "Hello",
        "Key2": "this",
        "Key3": "is",
        "Key4": "line one",
        "Key5": "of the file"
      },
      {
        "Key1": "Hello",
        "Key2": "this",
        "Key3": "is",
        "Key4": "line two",
        "Key5": "of the file"
      }]
}

The index I am using does not have any specific mappings as such. I am able to write a free-text Lucene query like

mydata.Key4:"line one"

which returns the entire document as a result. However, in my case, I would only like to retrieve the first part of the JSON object as the result. Is there a way to achieve this?

{
        "Key1": "Hello",
        "Key2": "this",
        "Key3": "is",
        "Key4": "line one",
        "Key5": "of the file"
}

I found that I can retrieve specific fields using _source_includes and passing the required keys, however, I am not able to find an equivalent to return all the keys within a specific part of the JSON document that matches the query. Is it because of how the file is being indexed? Could anyone guide me here?

EDIT:

I dropped my index and updated the mapping as follows

{
"mappings" : {
     
  "properties" : {
   "data" : {
    "type" : "nested"
   }
  }
 }
}

I re-indexed the document, quickly skimmed through ES documentation and ran the following nested query.

{
"_source": false,
  "query": {
       "nested": {
          "path": "data",
          "query": {
          "match": { 
               "data.Key4": "line one" 
          }
       },
       "inner_hits": {} 
  }
 }
}

However, this also returns all the documents in my index, except that now the returned results are under inner_hits

{
    "took": 1,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.52889514,
        "hits": [{
            "_index": "myindex",
            "_type": "_doc",
            "_id": "QAZJ-nMBi6fwNevjDQJy",
            "_score": 0.52889514,
            "inner_hits": {
                "data": {
                    "hits": {
                        "total": {
                            "value": 2,
                            "relation": "eq"
                        },
                        "max_score": 0.87546873,
                        "hits": [{
                            "_index": "myindex",
                            "_type": "_doc",
                            "_id": "QAZJ-nMBi6fwNevjDQJy",
                            "_nested": {
                                "field": "data",
                                "offset": 0
                            },
                            "_score": 0.87546873,
                            "_source": {
                                "Key1": "Hello",
                                "Key2": "this",
                                "Key3": "is",
                                "Key4": "line one",
                                "Key5": "of the file"
                            }
                        }, {
                            "_index": "myindex",
                            "_type": "_doc",
                            "_id": "QAZJ-nMBi6fwNevjDQJy",
                            "_nested": {
                                "field": "data",
                                "offset": 1
                            },
                            "_score": 0.18232156,
                            "_source": {
                                "Key1": "Hello",
                                "Key2": "this",
                                "Key3": "is",
                                "Key4": "line two",
                                "Key5": "of the file"
                            }
                        }]
                    }
                }
            }
        }]
    }
}

Am I missing something here?

Upvotes: 2

Views: 1744

Answers (1)

Gibbs
Gibbs

Reputation: 22964

As you are not defining mapping, that's the main problem. When you keep your data the way you mentioned, it will be kept as individual properties of type text.

When you perform search, it will bring the entire document. But if you define nested mapping for mydata then you can make use of inner_hits to retrieve only matching documents.

Edit:

Query to be used:

{
  "_source": false,
  "query": {
    "nested": {
      "path": "data",
      "inner_hits": {        
      },
      "query": {
        "bool": {
          "must": [
            {
              "term": { //To look for exact match
                "data.Key4.keyword": "line one" //need to match line one not line two
              }
            }
          ]
        }
      }
    }
  }
}

What happens when you use match:

line one will be tokenised as below

{
    "tokens": [
        {
            "token": "line",
            "start_offset": 0,
            "end_offset": 4,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "one",
            "start_offset": 5,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

Similarly it creates two tokens line, two.

So when you use match, it is full text search queries. It does analyse on index time and search time. So during search time, line one will be analysed and ES looks for either line or one. line two contains token line hence that is also part of the result.

To avoid that, you have to avoid analysing. So term queries has to be used. It looks for exact match.

Upvotes: 1

Related Questions