auny
auny

Reputation: 1970

Elasticsearch: Faceted query with terms returning unexpected result

I am trying to run a faceted query on some logs that I have stored in ES. The logs look something like

{"severity": "informational","message_hash_value": "00016B15", "user-agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1", "host": "192.168.8.225", "version": "1.0", "user": "[email protected]", "created_timestamp": "2013-03-01T15:34:00", "message": "User viewed contents", "inserted_timestamp": "2013-03-01T15:34:00"}

The query that i am trying to run is

curl -XGET 'http://127.0.0.1:9200/logs-*/logs/_search' 
-d {"from":0, "size":0, 
    "facets" : { 
         "user" : { 
            "terms" : {"field" : "user", "size" : 999999 } } } }

Notice that the field "user" in the logs is an email address. Now the problem is that the terms-facet search query i use returns a list of terms from the users field as given below.

u'facets': {u'user': {u'_type': u'terms', u'total': 2004, u'terms': [{u'count': 1002,u'term': u'test.co'}, {u'count': 320, u'term': u'user_1'}, {u'count': 295,u'term': u'user_2'}

Note that that list contains the term

{u'count': 1002,u'term': u'test.co'}

which is the domain name for the email addresses of the users. Why is elasticsearch treating the domain as a seperate term?

Running a query to check the mappings

curl -XGET 'http://127.0.0.1:9200/logs-*/_mapping?pretty=true'

yields the following for the "user" field

"user" : {
      "type" : "string"
    },

Upvotes: 1

Views: 741

Answers (1)

Dan Noble
Dan Noble

Reputation: 743

This happens because elasticsearch's default global analyzer tokenizes "@" (in addition to things like whitespace and punctuation) at index time. You can get around this issue by telling elasticsearch not to run an analyzer on this field, but you will have to reindex all of your data.

Create your new index

curl -XPUT 'http://localhost:9200/logs-new'

Specify in this new index's mapping that you don't want to analyze the "user" field

curl -XPUT 'http://localhost:9200/logs-new/logs/_mapping' -d '{
    "logs" : {
        "properties" : {
            "user" : {
                "type" : "string", 
                "index" : "not_analyzed"
            }
        }
    }
}'

Index a document

curl -XPOST 'http://localhost:9200/logs-new/logs' -d '{
    "created_timestamp": "2013-03-01T15:34:00", 
    "host": "192.168.8.225", 
    "inserted_timestamp": "2013-03-01T15:34:00", 
    "message": "User viewed contents", 
    "message_hash_value": "00016B15", 
    "severity": "informational", 
    "user": "[email protected]", 
    "user-agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1", 
    "version": "1.0"
}'

The elasticsearch facet will now display the entire email address

curl -XGET 'http://localhost:9200/logs-new/logs/_search?pretty' -d '{
    "from":0, 
    "size":0, 
    "facets" : { 
         "user" : { 
            "terms" : {
                "field" : "user", 
                "size" : 999999 
            }
        } 
    }
}'

Result:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ ]
  },
  "facets" : {
    "user" : {
      "_type" : "terms",
      "missing" : 0,
      "total" : 1,
      "other" : 0,
      "terms" : [ {
        "term" : "[email protected]",
        "count" : 1
      } ]
    }
  }
}

References: Core Types: http://www.elasticsearch.org/guide/reference/mapping/core-types/ Reindexing with a new mapping: https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/tCaXgjfUFVU

Upvotes: 2

Related Questions