D vignesh
D vignesh

Reputation: 97

How elasticsearch is implemented at the phycial level

I want to know how a command to index a document (Put /index/type/id{data}) is executed and how the document is stored at the physical level.I want to know the hardware level meaning of the terms index,type,document,shards.Could anyone please give the hardware level implementation of elasticsearch

Upvotes: 0

Views: 146

Answers (1)

Mehmet Ekin Uygur
Mehmet Ekin Uygur

Reputation: 63

I tried to briefly summarize the important sections of Elasticsearch for you but I definitely recommend you to read The definitive guide.

Brief Summary

Inverted Indexes:

In its essence, Elasticsearch was designed on a technology called Lucene (which was released in 1998 as part of an Apache project). Lucene was designed to create inverted indexes from the documents inserted into the system. Elasticsearch came into play later on to make this system distributed. To be honest, that is where the power of elastic search comes from.

In order to understand Elasticsearch, you first need to understand the indexing in Elasticsearch. Indexing basically means to create an inverted index from a given document. Please read the following link to understand the Inverted indexes.

Link: Inverted Index

Distributed execution of requests:

Elasticsearch is a distributed technology which makes it possible to to have multiple different nodes running elastic in the same network. Together all the created nodes forms a cluster.

If you run the GET command on a specific index, you will get the schema information as the following:

{
 "mappings": { --> mappings component of the index defines the structure of what fields contain what kind of data
   ...
 },
 "settings": { --> defines configuration settings for an index
   ...
 }
}

and if you look at the settings section in detail, you might find something very similar to the following:

"settings": {
   "index": {
     "creation_date": "1504387536767",
     "number_of_shards": "5",           
     "number_of_replicas": "1",
     "uuid": "bfVuaeZtTYyWffcqKHOP_w",
     "version": {
       "created": "5050199"
     },
     "provided_name": "noname"
   }
 }

number_of_shards: This defines the physical representation of an index. Shards are mainly used to split up the index documents into different chunks.

number_of_replicas: This defines the number of backup shards.

Now lets look at the following schema

cluster

|-------------------------------------------------------------------------------------------------------|
|  node1                    node2                                                                       |
|  |---------------|        |---------------|                                                           |
|  | P0            |        | P1            |  --> P0 is in node1 and its replica(R0) is in node2       |
|  | R1            |        | R0            |      P1 is in node2 and its replica(R1) is in node1       |
|  |               |        |               |                                                           |
|  |               |        |               |                                                           |
|  |---------------|        |---------------|                                                           |
|-------------------------------------------------------------------------------------------------------|

P represents primary shards

R represents replica shards

In this example, if any of the nodes goes down we can still reach all the data because of the reason that node1 has P0 and R1 and node2 has P1 and R0. This setup also benefits out of load balancing for the data read requests.

This is just a very simple example to give you the idea about how Elasticsearch makes things faster without getting into the details of it.

Text analysis:

When we send a document to Elasticsearch, the document goes through an analysis process. This process is the one which creates the inverted index representation and applies filters if there are any (such as stop word removal, lowercasing, stemming and synonyms)

If we index two documents as shown in the example below, elastic ends up creating lucene index table as the following

Example:

doc1 = I think elastic is the next big thing

doc2 = next big thing is in big data

once the analyzer processes these two documents, it creates the inverted indexes as the following

  |---------|------------|-----------|
  |  Token  | Exits in 1 | Exits in 2|
  |---------|------------|-----------|
  |I        |     X      |           |
  |think    |     X      |           |
  |elastic  |     X      |           |
  |is       |     X      |     X     |
  |the      |     X      |           |
  |next     |     X      |     X     |
  |big      |     X      |     X     |
  |thing    |     X      |     X     |
  |in       |     X      |     X     |
  |data     |            |     X     |
  |---------|------------|-----------|

After creating inverted indexes for the documents, we can search for a word. If we search for the word "elastic", it returns both of the documents. If we search the word "data", it ends up returning the second document.

Extra:

RDBMS to Elasticsearch

If you are familiar with the relational DBs, the following chart might also help you to understand what sections matches with what information in Elasticsearch

 |Elasticsearch|Relational DB|
 |Field        |Column       |
 |Document     |Row          |
 |Type         |Table        |
 |Index        |Database     |

Two different meanings of the word "index"

It is also worth mentioning that the word "index" is used interchangeably in Elasticsearch. We use the word "index" for two different things in Elasticsearch.

The first one is the time when we refer to an elastic DB. An elastic database is called an index (such as an Animals Index). A database has types in it (In this case, we can have Amphibians, Reptiles and Mammals). Inside the types we have documents and inside the documents we have fields.

The second one is the time when we insert a document into an index. This is called "indexing a document". For example; the following PUT request triggers elastic to index the given document.

 PUT /animals/birds/parrot
 {
  "info": "Parrots, also known as psittacines",
  "lifespan": 50,
  "avglength": 3.4,
 }

Upvotes: 0

Related Questions