max
max

Reputation: 10454

elastic search does not find the word

I've loaded some data in elasticsearch and validated that it's been indexed:

curl -XGET http://localhost:9200/studies/gov/NCT02953782
{"_index":"studies","_type":"gov","_id":"NCT02953782","_version":5,"found":true,"_source":{"status": "Recruiting", "is_fda_regulated": false, "description": "", "open_study_countries": ["United States"], "phase": "Phase 1/Phase 2", "completion_date": "2018-05-01", "references": [], "is_drug_intervention": true, "keywords": ["Colorectal Neoplasms, Hu5F9-G4, CD47, cetuximab"], "id": "NCT02953782", "title": "Trial of Hu5F9-G4 in Combination With Cetuximab in Patients With Solid Tumors and Advanced Colorectal Cancer", "summary": " This trial will evaluate Hu5F9-G4 in combination with cetuximab. Hu5F9-G4 is a monoclonal antibody which is designed to block a protein called CD47, which is widely expressed on human cancer cells. Blocking CD47 with Hu5F9-G4 may enable the body's immune system to find and destroy the cancer cells. Cetuximab is a monoclonal antibody drug that is used for treatment of certain types of colorectal cancer as well as head and neck cancer.\n The major aims of the study are: (Phase 1b) to define the safety profile and to determine a recommended Phase 2 dose for Hu5F9-G4 in combination with cetuximab, and (Phase 2) to evaluate the objective response rate of Hu5F9-G4 in combination with cetuximab in patients with advanced colorectal cancer. ", "first_received_date": "2016-11-01", "inclusion_criteria": " - Histological Diagnosis - Phase 1b only: Advanced solid malignancy with an emphasis on colorectal, head and neck, breast, pancreatic and ovarian cancers who have been treated with at least one regimen of prior systemic therapy, or who refuse systemic therapy, and for which there is no curative therapy available. - Phase 2: - KRAS Mutant CRC: Advanced KRAS mutant CRC who have progressed or are ineligible for both irinotecan and oxaliplatin based chemotherapy - KRAS Wild Type CRC: Advanced KRAS wild type CRC who have progressed or are ineligible for both irinotecan and oxaliplatin based chemotherapy and who are relapsed or refractory to at least 1 prior systemic therapy that included an anti-EGFR antibody, such as cetuximab, panitumumab or others. - Adequate performance status and hematological, liver, and kidney function - Phase 2 only: Willing to consent to 1 mandatory pre-treatment and 1 on-treatment tumor biopsy ", "exclusion_criteria": " - Active brain metastases - Prior treatment with CD47 or signal regulatory protein alpha (SIRPα) targeting agents. - Phase 2 only: second malignancy within the last 3 years. - Known active or chronic hepatitis B or C infection or HIV - Pregnancy or active breastfeeding", "start_date": "2016-11-01"}}

when I search for a word within that using simple query search, it does not find it:

curl -XPOST http://localhost:9200/studies/_search?pretty=true -d '{
  "query": {
    "simple_query_string" : {
        "query": "Colorectal",
        "analyzer": "snowball"
    }
  }
}'
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

Does anyone know how to make it work?

Upvotes: 0

Views: 362

Answers (1)

Mario Trucco
Mario Trucco

Reputation: 2011

Your simple_query_string query, like the query_string query, is performed against the index.query.default_field index settings, which defaults to _all. The _all field is by default analyzed using the Standard Analyzer.

You are requiring the query to use the Snowball Analyzer, which is probabilly not what you have used at index time.

If you don't really need the snowball analyzer, you can just ask your query to use the Standard Analyzer.

Conversely, if you actually want to use the snowball analyzer, you should put such mapping for the _all field before starting indexing documents, but be sure that you are not breaking any other queries by doing so.

This is how you'd put the snowball analyzer in the mapping for the gov type int the studies index:

curl -XPUT http://localhost:9200/studies/_mapping/gov -d'{
  "_all": {
    "analyzer": "snowball"
  }
}'

After that you'd need to index your documents again.

Upvotes: 2

Related Questions