Jacket
Jacket

Reputation: 884

Can I extract the actual value of not_analyzed field when _source is disabled?

I have the following mapping:

{
   "articles":{
      "mappings":{
         "article":{
            "_all":{
               "enabled":false
            },
            "_source":{
               "enabled":false
            },
            "properties":{
               "content":{
                  "type":"string",
                  "norms":{
                     "enabled":false
                  }
               },
               "url":{
                  "type":"string",
                  "index":"not_analyzed"
               }
            }
         }
      },
      "settings":{
         "index":{
            "refresh_interval":"30s",
            "number_of_shards":"20",
            "analysis":{
               "analyzer":{
                  "default":{
                     "filter":[
                        "icu_folding",
                        "icu_normalizer"
                     ],
                     "type":"custom",
                     "tokenizer":"icu_tokenizer"
                  }
               }
            },
            "number_of_replicas":"1"
         }
      }
   }
}

The questions is will it be possible to somehow extract the actual values of the url field since it not_analyzed and when _source is not enabled? I need to perform this only once for this index, so even a hacky way will be acceptable.

I know that not_analyzed means that the string won't be tokenized, so it makes sense to me that it should be store somewhere, but I don't know if it is hashes or 1:1 and I couldn't find information about this in the documentation.

My servers are running ES version 1.4.4 with JVM: 1.8.0_31

Upvotes: 1

Views: 74

Answers (1)

IanGabes
IanGabes

Reputation: 2797

You can read the field data to retrieve the url from the document. We will be reading straight from the ES index, so we will get exactly what we are "matching" on, in this case, the exact URL you indexed as it is not analyzed.

Using the example index you provided, I indexed two URLs (on a smaller subset of your provided index:

POST /articles/article/1
{
    "url":"https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-fielddata-fields.html"
}
POST /articles/article/2
{
    "url":"http://stackoverflow.com/questions/37488389/can-i-extract-the-actual-value-of-not-analyzed-field-when-source-is-disabled"
}

And then this query will provide me a new "fields" object for each hit:

GET /articles/article/_search
{
    "fielddata_fields" : ["url"]
}

Giving us these results:

"hits": [
         {
            "_index": "articles",
            "_type": "article",
            "_id": "2",
            "_score": 1,
            "fields": {
               "url": [
                  "http://stackoverflow.com/questions/37488389/can-i-extract-the-actual-value-of-not-analyzed-field-when-source-is-disabled"
               ]
            }
         },
         {
            "_index": "articles",
            "_type": "article",
            "_id": "1",
            "_score": 1,
            "fields": {
               "url": [
                  "https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-fielddata-fields.html"
               ]
            }
         }
      ]

Hope that helps!

Upvotes: 1

Related Questions