user3661165
user3661165

Reputation: 3

Elasticsearch and Free-Form Text

I am new to Elasticsearch. All of the examples, tutorials and questions I can find seem to indicated that Elasticsearch requires documents are formatted as field-value relationships. Is it possible to search free-form text documents or logs that have no specific key-value relationship in their formatting? If so can someone please point me to an example/documentation?

Upvotes: 0

Views: 928

Answers (1)

Luc DUZAN
Luc DUZAN

Reputation: 1319

Yes it's totally possible to do it, by creating a document with only one field which will contain all your free-form text. And it will work really fine because ElasticSearch is really good for text search. You can totally index document such as :

{ text : "
An alpaca (Vicugna pacos) is a domesticated species of South American camelid. It resembles a small llama in appearance. There are two breeds of alpaca; the Suri alpaca and the Huacaya alpaca. Alpacas are kept in herds that graze on the level heights of the Andes of southern Peru, northern Bolivia, Ecuador, and northern Chile at an altitude of 3,500 m (11,500 ft) to 5,000 m (16,000 ft) above sea level, throughout the year.[1] Alpacas are considerably smaller than llamas, and unlike llamas, they were not bred to be beasts of burden, but were bred specifically for their fiber. Alpaca fiber is used for making knitted and woven items, similar to wool. These items include blankets, sweaters, hats, gloves, scarves, a wide variety of textiles and ponchos in South America, and sweaters, socks, coats and bedding in other parts of the world. The fiber comes in more than 52 natural colors as classified in Peru, 12 as classified in Australia and 16 as classified in the United States.[2]
"}

If you do that and you know that all your text will be in English or an other language, you should specify your one mapping to ElasticSearch, for example a creation of a mapping, an indexation and a search :

PUT /freetext 
{
  "mappings": {
    "properties" : {
        "text" : {
            "type" : "string",
            "analyzer": "english"
        }
    }
  }
}

PUT /freetext/text/alpaga
{
    "text" : "alpaga are awesome"
}

GET /freetext/text/_search?q="alpaga"

You will probably be interested by more specific analyzer, I think this one is good :

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-snowball-analyzer.html

But before doing that, you should think about you data, I am sure you will always found some other field. For example, you will probably want to index your log with specific fields for (date, IP, application name, type of information (error, warning, information)). Most of text document have at list an author and a date.

By separating those special information and telling Elastic Search what there are (date and text that should not be analyze), you will then be able to do search such as :

  • Give me all log that are about error in the last two day
  • Count me the number of article for each author

Upvotes: 1

Related Questions