Bhawan
Bhawan

Reputation: 2491

sequence number vs version number in elasticsearch

I am reading the concepts of elasticsearch-7.4 and I got to know about two fields. _seq_no and _version.

As per the documentation:

Version

Returns a version for each search hit.

Sequence Numbers and Primary Term

Returns the sequence number and primary term of the last modification to each search hit.

But it is not clearing anything related to when they both will be different or same for a document.

I created an index test

PUT /test/_doc/_mapping
{
  "properties": {
    "total_price" : {
      "type": "integer"
    },
    "final_price": {
      "type": "integer"
    },
    "base_price": {
      "enabled": false
    }
  }
}

I am updating the full document using PUT API.

PUT /test/_doc/2
{
  "total_price": 10,
  "final_price": 10,
  "base_price": 10
}

Both _seq_no and _version are increasing in this case.

On doing partial updates using UPDATE API,

POST /test/_doc/2/_update
{
    "doc" : {
        "base_price" : 10000
    }
}

Both _seq_no and _version are increasing in this case too.

So, I am unable to find the case when only one field is changing but the other is not.
When will both the fields be different?

Upvotes: 10

Views: 11819

Answers (2)

Ahmad
Ahmad

Reputation: 1029

Elasticsearch documents are immutable

Elasticsearch documents are immutable, this means that whenever you update a document, a new version of that document will be created, regardless of whether you are using PUT (updating the entire document) or POST (updating some parts of the document).

Each newly created document will be given a new incremented version, which is identified by the _version field:

{
    "_index": "movies",
    "_type": "_doc",
    "_id": "109487",
    "_version": 14,
    "result": "updated",
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "_seq_no": 17,
    "_primary_term": 7
}

Blog website

Imagine that you have a blog website, and there are 2 users were hitting the same blog post of id 1 at the same time: GET https://myblog.com/posts/1

Back to Elasticsearch, the post document has a field named view_count, this field stores the total number of views (how many times the post was viewed).

To increment the view_count you have to send a GET request reading the current value:

GET /posts/_doc/1
{
    "_index": "movies",
    "_type": "_doc",
    "_id": "109487",
    "_version": 12,
    "_seq_no": 15,
    "_primary_term": 7,
    "found": true,
    "_source": {
        "post": "Lorem ipsum ...",
        "title": "My title",
        "published_at": "2020-01-01",
        "view_count": 10
    }
}

Then you update the view_count of post id 1 by incrementing the returned value (from GET) by 1:

PUT /posts/_doc/1/_update
{
    "doc": {
        "view_count": 11
    }
}

There is a problem here.

Since both users were hitting the same post page at the same time, they’ll be getting the value of 10.

As you see here, the value 11 was stored, but that is incorrect, since we updated the document twice (remember 2 users hit the post id at the same time), hence the value should be 12.

But why? That is because both users have gotten the value 10 when they read the view_count.

So, how do we solve this issue?

Fortunately, Elasticsearch uses something called Optimistic concurrency control (OCC) (Optimistic concurrency control - Wikipedia).

To ensure that the recent document needs to be updated we send the if_primary_term alongside with the if_seq_no values (which are fetched from the GET request):

POST /posts/_update/1?if_primary_term=1&if_seq_no=10

That’s it.

Upvotes: 4

Val
Val

Reputation: 217434

Sequence numbers have been introduced in ES 6.0.0. Just before that release came out, they were very well explained in this blog article.

But in summary,

  • version is a sequential number that counts the number of time a document was updated
  • _seq_no is a sequential number that counts the number of operations that happened on the index

So if you create a second document, you'll see that version and _seq_no will be different.

Let's create three documents:

POST test/_doc/_bulk
{"index": {}}
{"test": 1}
{"index": {}}
{"test": 2}
{"index": {}}
{"test": 3}

In the response, you'll get the payload below.

{
  "took" : 166,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "d2zbSW4BJvP7VWZfYMwQ",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "eGzbSW4BJvP7VWZfYMwQ",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "eWzbSW4BJvP7VWZfYMwQ",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 2,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

As you can see:

  • for all documents, version is 1
  • for document 1, _seq_no is 0 (first index operation)
  • for document 2, _seq_no is 1 (second index operation)
  • for document 3, _seq_no is 2 (third index operation)

Upvotes: 20

Related Questions