daisy
daisy

Reputation: 23499

copy_to and custom analyzer not working

(I'm doing this with a fresh copy of Elasticsearch 1.5.2)

I've defined a custom analyzer and it's working:

curl -XPUT 127.0.0.1:9200/test -d '{
  "settings": {
      "index": {
        "analysis": {
          "tokenizer": {
            "UrlTokenizer": {
              "type":    "pattern",
              "pattern": "https?://([^/]+)",
              "group":   1
            }
          },
          "analyzer": {
            "accesslogs": {
              "tokenizer": "UrlTokenizer"
            }
          }
        }
     }
  }
}'; echo

curl '127.0.0.1:9200/test/_analyze?analyzer=accesslogs&text=http://192.168.1.1/123?a=2#1111' | json_pp

Now I apply it to an index:

curl -XPUT 127.0.0.1:9200/test/accesslogs/_mapping -d '{
  "accesslogs" : {
    "properties" : {
      "referer" : { "type" : "string", "copy_to" : "referer_domain" },
      "referer_domain": {
         "type":     "string",
         "analyzer": "accesslogs"
      }
    }
  }
}'; echo

From the mapping I can see both of them are applied.

Now I try to insert some data,

curl 127.0.0.1:9200/test/accesslogs/ -d '{
    "referer": "http://192.168.1.1/aaa.php",
    "response": 100
}';echo

And the copy_to field, aka referer_domain was not generated and if I try to add a field with that name, the tokenizer is not applied either.

Any ideas?

Upvotes: 2

Views: 3330

Answers (1)

Andrei Stefan
Andrei Stefan

Reputation: 52368

copy_to works but, you are assuming that since you don't see the field being generated, it doesn't exist.

When you return your document back (with GET /test/accesslogs/1 for example), you don't see the field under _source. This contains the original document that has been indexed. And you didn't index any referer_domain field, just referer and response. And this is the reason why you don't see it.

But Elasticsearch does create that field in the inverted index. You can use it to query, compute or retrieve if you stored it.

Let me exemplify my statements:

  • you can query that field and you will get results back based on it. If you really want to see what has been stored in the inverted index, you can do this:
GET /test/accesslogs/_search
{
  "fielddata_fields": ["referer","response","referer_domain"]
}
  • you can, also, retrieve that field if you stored it:
  "referer_domain": {
    "type": "string",
    "analyzer": "accesslogs",
    "store" : true
  }

with this:

GET /test/accesslogs/_search
{
  "fields": ["referer","response","referer_domain"]
}

In conclusion, copy_to modifies the indexed document, not the source document. You can query your documents having that field and it will work because the query looks at the inverted index. If you want to retrieve that field you need to store it, as well. But you will not see that field in the _source field because _source is the initial document that has been indexed. And the initial document doesn't contain referer_domain.

Upvotes: 9

Related Questions