user3070752
user3070752

Reputation: 734

ElasticSearch count API return the same number even after successful document insertion

I add new documents to an index in ElastichSearch using the bulk API of ElasticSearch python module. It returns success and when I search for the document in ElasticSearch index I can find it. So I'm sure the insertion is correct. However, using the ElastiSearch count API (https://www.elastic.co/guide/en/elasticsearch/reference/current/cat-count.html) I get the same number as before inserting the documents. Here's the output of it:

epoch      timestamp count
1604741686 09:34:46  1297277503

Is it because I have too many documents in the index?

Upvotes: 0

Views: 638

Answers (2)

Amit
Amit

Reputation: 32376

As @eocron mentioned, ES uses the refresh to soft comment the changes so that it's available for searching, but the default of which is 1 sec, and its called refresh_interval.

But I see, you have more than 129 Million docs, and please note that count API, uses replica shards to increase the scalability and performance of this API, so it may be possible that data is not replicated to all your replica shards(which might take a lot of time) based on the cluster, nodes, index configuration(more replica shads means more time to replicate all the changes) and load on data nodes.

Please read the Desc. of count API and from the same doc

The operation is broadcast across all shards. For each shard id group, a replica is chosen and executed against it. This means that replicas increase the scalability of count.

In short, you need to tell below, so that we can give more specific reasons.

  1. refresh_interval of your index
  2. How much time it takes to replicate data in your all replica shards.
  3. After sending an update/index request, how much gap you are giving before checking the count

Upvotes: 2

eocron
eocron

Reputation: 7536

I think it is because Elasticsearch has refreshes every now and then (configurable), so you won't see your result immediately, but after it refresh indexes. Considering you check it by hand after your modifications - it probably already refreshed by then, so you will see updated values.

It has couple of solutions if you want to play with this behavior. It can be usefull for test, but I highly insist that you redesign your app to allow it to be asynchronous in production.

More here - https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html

Upvotes: 1

Related Questions