Reputation: 273
I want to index the month
field of a bibtex entry into elasticsearch and make it searchable via the range
query. This requires the underlying field type to be some kind of numeric datatype. In my case short
would be sufficient.
The bibtex month
field in its canonical form requires a three character abbreviation, so I tried to use the char_filter
like so:
...
"char_filter": {
"month_char_filter": {
"type": "mapping",
"mappings": [
"jan => 1",
"feb => 2",
"mar => 3",
...
"nov => 11",
"dec => 12"
]
}
...
"normalizer": {
"month_normalizer": {
"type": "custom",
"char_filter": [ "month_char_filter" ],
},
And put up mappings like this:
...
"month": {
"type": "short",
"normalizer": "month_normalizer"
},
...
But it doesn't seem to work since the type
field doesn't support normalizers like this, as well as it doesn't support analyzers.
So what would be the approach to implement such a mapping as shown in the char_filter
part so there are range query possibilites?
Upvotes: 0
Views: 112
Reputation: 217254
Your approach intuitively makes sense, however, normalizers can only be applied to keyword
fields and analyzers to text
fields.
Another approach would be to leverage the ingest processors and use the script
processor to do that mapping at indexing time.
Below you can find a simulation of such a script
processor that would create a new field called monthNum
based on the month present in the month
field.
POST _ingest/pipeline/_simulate
{
"pipeline": {
"processors": [
{
"script": {
"source": """
def mapping = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec'];
ctx.monthNum = mapping.indexOf(ctx.month) + 1;
"""
}
}
]
},
"docs": [
{
"_source": {
"month": "feb"
}
},
{
"_source": {
"month": "mar"
}
},
{
"_source": {
"month": "jul"
}
},
{
"_source": {
"month": "aug"
}
},
{
"_source": {
"month": "nov"
}
},
{
"_source": {
"month": "dec"
}
},
{
"_source": {
"month": "xyz"
}
}
]
}
Resulting documents:
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 2,
"month" : "feb"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 3,
"month" : "mar"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 7,
"month" : "jul"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 8,
"month" : "aug"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 11,
"month" : "nov"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 12,
"month" : "dec"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
},
{
"doc" : {
"_index" : "_index",
"_type" : "_type",
"_id" : "_id",
"_source" : {
"monthNum" : 0,
"month" : "xyz"
},
"_ingest" : {
"timestamp" : "2019-05-08T12:28:27.006Z"
}
}
}
]
}
Upvotes: 3