ivanacorovic
ivanacorovic

Reputation: 2867

Can I specify regexp in stopwords for stop analyzer in elasticsearch?

I want to use this analyzer: skip every word "g", "l" and all decimal numbers you come across. I want to use an analyzer, but I'm not sure if using stop analyzer is right, nor how to specify these decimal numbers to be skipped. I have this:

PUT /products
{
"settings": {
    "analysis": {
        "filter": {
            "my_stopwords": {
                "type":       "stop",
                "stopwords": [ "l", "g" ]
        }},
        "analyzer": {
            "my_analyzer": {
                "type":         "custom",
                "tokenizer":    "standard",
                "filter":       [ "lowercase", "my_stopwords" ]
        }}

}}}

How to fix it so it works with decimal numbers?

Upvotes: 2

Views: 777

Answers (1)

ivanacorovic
ivanacorovic

Reputation: 2867

Me again.. I don't seem to be able to add regexp to stopwords. However, I did manage to work around it by adding another filter called filter_amount. This is what it looks like:

             "filter_amount": {
              "type": "pattern_replace",
              "pattern": "[\\d]+([\\.,][\\d]+)?",
              "replacement": ""
             }

So this is what the settings should look like:

PUT /products
{
"settings": {
    "analysis": {
      "filter": {
          "my_stopwords": {
              "type":       "stop",
              "stopwords": [ "l", "g" ]
          },
         "filter_amount": {
              "type": "pattern_replace",
              "pattern": "[\\d]+([\\.,][\\d]+)?",
              "replacement": ""
          }
        },
        "analyzer": {
            "my_analyzer": {
                "type":         "custom",
                "tokenizer":    "standard",
                "filter":       [ "lowercase", "my_stopwords", "filter_amount"]
        }}
  }}}

The rest is the same. Cheers!

Upvotes: 2

Related Questions