Reputation: 23499
(I'm doing this with a fresh copy of Elasticsearch 1.5.2)
I've defined a custom analyzer and it's working:
curl -XPUT 127.0.0.1:9200/test -d '{
"settings": {
"index": {
"analysis": {
"tokenizer": {
"UrlTokenizer": {
"type": "pattern",
"pattern": "https?://([^/]+)",
"group": 1
}
},
"analyzer": {
"accesslogs": {
"tokenizer": "UrlTokenizer"
}
}
}
}
}
}'; echo
curl '127.0.0.1:9200/test/_analyze?analyzer=accesslogs&text=http://192.168.1.1/123?a=2#1111' | json_pp
Now I apply it to an index:
curl -XPUT 127.0.0.1:9200/test/accesslogs/_mapping -d '{
"accesslogs" : {
"properties" : {
"referer" : { "type" : "string", "copy_to" : "referer_domain" },
"referer_domain": {
"type": "string",
"analyzer": "accesslogs"
}
}
}
}'; echo
From the mapping I can see both of them are applied.
Now I try to insert some data,
curl 127.0.0.1:9200/test/accesslogs/ -d '{
"referer": "http://192.168.1.1/aaa.php",
"response": 100
}';echo
And the copy_to
field, aka referer_domain
was not generated and if I try to add a field with that name, the tokenizer is not applied either.
Any ideas?
Upvotes: 2
Views: 3330
Reputation: 52368
copy_to
works but, you are assuming that since you don't see the field being generated, it doesn't exist.
When you return your document back (with GET /test/accesslogs/1
for example), you don't see the field under _source
. This contains the original document that has been indexed. And you didn't index any referer_domain
field, just referer
and response
. And this is the reason why you don't see it.
But Elasticsearch does create that field in the inverted index. You can use it to query, compute or retrieve if you stored it.
Let me exemplify my statements:
GET /test/accesslogs/_search
{
"fielddata_fields": ["referer","response","referer_domain"]
}
"referer_domain": {
"type": "string",
"analyzer": "accesslogs",
"store" : true
}
with this:
GET /test/accesslogs/_search
{
"fields": ["referer","response","referer_domain"]
}
In conclusion, copy_to
modifies the indexed document, not the source document. You can query your documents having that field and it will work because the query looks at the inverted index. If you want to retrieve that field you need to store it, as well. But you will not see that field in the _source
field because _source
is the initial document that has been indexed. And the initial document doesn't contain referer_domain
.
Upvotes: 9