Several very specific elasticsearch questions

Question

I have a few very specific questions about Rails + Tire + ElasticSearch.

I have watched the Railscast about it, and I have read a lot of documentation but to be honest it is over my head. I would love someone to help me understand the finer points that I can't quite grasp.

Here is Resource.rb elasticsearch portion from my model:

  include Tire::Model::Search
  include Tire::Model::Callbacks

  mapping do
    indexes :url
    indexes :title,       :boost => 3
    indexes :description, :boost => 2
    indexes :category,    :boost => 1.5, type: 'object',
              properties: {
                name: { type: 'multi_field',
                  fields: { name: { type: 'string', analyzer: 'keyword' } } } }
    indexes :user, type: 'object',
              properties: {
                  username: { type: 'multi_field',
                      fields: { username: { type: 'string', analyzer: 'keyword' } } } }
  end  

  def self.elasticsearch(params)
    tire.search(load: true, page: params[:page], per_page: 20) do
      query { string params[:e], default_operator: "OR" } if params[:e].present?
    end
  end

  def to_indexed_json
    to_json( include: { user: { only: [:username] }, 
                    category: { only: [:name] } 
           } )
  end

What does 'not_analyzed' mean? On many of the tutorials I'm reading, they use this. If it's not analyzed, why is it included in the mapping do?
What is the purpose of using indexes. For example, something like indexes :id, type: 'integer'. Why would an integer need to be indexed, does that help with performance or something?
How do I modify the analyzer for the URL so it works better? For example if its stored as http://www.dropbox.com, searching dropbox.com doesn't find a result, but www.dropbox.com does. I have tried pasting in all the different analyzers and none of them really work for the URL
If my category.name is stored as plural, e.g 'books', 'movies', 'tapes', how can I tell the analyzer to look at this word based on singular as well as plural. Searching for 'movie' doesnt work, however 'movies' does work
When I remove load: true, my entire site breaks. He went over this in the railscast but only for a moment. Does that mean I need to move EVERY attribute (and association) into the mapping, and change it to :not_analyzed? (I just realized... maybe I just answered my own question #1!).
In general, what type of data works best for OR and which works best for AND? I'm thinking or seems more lenient as far as getting more results

javanna · Accepted Answer

It's all about Lucene: an indexed field is a field on which you want to search. When you index a field you can decide whether you want to analyze it or not. That means that you could index it as it is, without tokenizing it nor applying any token filter. Otherwise you can apply an analyzer to it. There are some analyzers available out of the box with Lucene, exposed as well in elasticsearch. An analyzer is composed of a tokenizer and a list of token filters. The tokenizer determines how you split a field content in different terms. With the token filters you can filter those terms and/or modify them.

For example the most common way of tokenizing is using the WhitespaceTokenizer. Then you can apply stemming for example, in order to index stems of the terms. For example running becomes run and the plural terms become singular.

Sometimes (pretty often actually), you need to create your own analyzer combining a tokenizer and the token filters that you want to use. You can do it in elasticsearch within your settings defining a custom analyzer.

Sometimes there are fields that you don't want to index in lucene, since you are not going to search on them, but you do want to store them. A stored field is a field that you want to give back within the search results. In fact lucene can search on the indexed fields, but can only return the stored ones. Luckily elasticsearch helps us storing the whole _source documents so that we get back the whole document that we indexed by default. You can always disable this feature if you don't want to store the source in elasticsearch. Otherwise if you don't want to have back the whole source while querying you can just specify the list of the fields that you want back. If they are stored (you can configure it in your mapping, the default for each field is indexed but not stored) they are returned straight away, otherwise they are extracted from the source itself (if not disabled). If you have big documents I'd suggest to configure the fields that you want back otherwise you'd get back the whole source every time.

Several very specific elasticsearch questions

Answers (2)

Related Questions