Ramji
Ramji

Reputation: 75

Near Similarity and duplication detection

I have a ticketing system where people create ticket for their issue. When someone is trying to create a new ticket I have to search my elastic search to identify whether a similar ticket is already exist using subject and description as query input. Below is my index which is having custom analyser and mappings

PUT /ticket_search
{
  "settings": {
    "analysis": {
      "filter": {
        "my_synonym": {
          "type": "synonym",
          "synonyms": [
            "error, issue",
            "ticket, request",
            "problem",
            "incident"
          ]
        },
        "my_shingle": {
          "type": "shingle",
          "min_shingle_size": 5,
          "max_shingle_size": 5,
          "output_unigrams": false
        },
        "my_minhash": {
          "type": "min_hash",
          "hash_count": 1,
          "bucket_count": 512,
          "hash_set_size": 1,
          "with_rotation": true
        }
      },
      "analyzer": {
        "similarity_search": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "stop",
            "my_synonym",
            "my_shingle",
            "my_minhash"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "subject": {
        "type": "text",
        "analyzer": "similarity_search"
      },
      "description": {
        "type": "text",
        "analyzer": "similarity_search"
      }
    }
  }
}

I have inserted two sample documents

PUT _bulk
{ "index": { "_index": "ticket_search", "_id": 1 } }
{
  "subject": "Error login into ABC portal",
  "description": "From the morning im trying to loging into ARC but could not able to do so. Please assist me"
}
{ "index": { "_index": "ticket_search", "_id": 2 } }
{
  "subject": "Issue sign in  into Automatic Bue Cloud",
  "description": "Im creating this ticket to let you know I cannot login to the ABC portal"
}

When I query using MLT I'm not getting any hits. Please help

GET ticket_search/_search?explain=true
{  "query": {
    "more_like_this" : {
      "fields" : ["subject", "description"],
      "like" : "Error",
      "min_term_freq" : 1,
      "max_query_terms" : 12
    }
  }
}

I dit try what the official document suggested. But everything went into vein. I want to build a index which stores subject and description. When I search by using subject and description, I should get all similar or near duplication result which should be suggested to user while creating a new ticket

Upvotes: 0

Views: 25

Answers (0)

Related Questions