Reputation: 24
I'm trying index and search by email using Tire and elasticsearch.
The problem is that if I search for: "[email protected]". I get strange results because of @ and . symbols. I "solved" by hacking the query string and adding "email:" before a string I suspect is a string. If I don't do that, when searching "[email protected]", I would get results as "[email protected]" or "[email protected]".
include Tire::Model::Search
include Tire::Model::Callbacks
settings :analysis =>{
:analyzer => {
:whole_email => {
'tokenizer' => 'uax_url_email'
}
}
} do
mapping do
indexes :id
indexes :email, :analyzer => 'whole_email', :boost => 10
end
end
def self.search(params)
params[:query] = params[:query].split(" ").map { |x| x =~ EMAIL_REGEXP ? "email:#{x}" : x }.join(" ")
tire.search(load: {:include => {'event' => 'organizer'}}, page: params[:page], per_page: params[:per_page] || 10) do
query do
boolean do
must { string params[:query] } if params[:query].present?
must { term :event_id, params[:event_id] } if params[:event_id].present?
end
end
sort do
by :id, 'desc'
end
end
end
def to_indexed_json
self.to_json
end
When searching with "email:" the analyzer works perfectly but without it, it search that string in email without the specified analyzer, getting lots of undesired results.
Upvotes: 0
Views: 3788
Reputation: 336
Add the field to _all and try search with adding escape character(\) to special characters of emailid.
example:something\@example\.com
Upvotes: 2
Reputation: 3400
I think your issue is to do with the _all
field. By default, all fields get indexed twice, once under their field name, and again, using a different analyzer, in the _all
field.
If you send a query without specifying which field you are searching in, then it will be executed against the _all
field. When you index your doc, the email fields content is indexed again under the _all
field (to stop this set include_in_all: false
in your mapping) where they are tokenized the standard way (split on @ and .). This means that unguided queries will give strange results.
The way I would fix this is to use a term
query for the emails and make sure to specify the field to search on. A term query is faster as it doesn't have a query parsing step the query_string
query has (which is why when you prefix the string with "email:" it goes to the right field, that's the query parser working). Also you don't need to specify a custom analyzer unless you are indexing a field that contains both free text and urls and emails. If the field only contains emails then just set index: not_analyzed
and it will remain a single token. (You might want to have a custom analyzer that lowercases the email though.)
Make your search query like this:
"term": {
"email": "[email protected]"
}
Good luck!
Upvotes: 3