AlexMaskovyak
AlexMaskovyak

Reputation: 101

Elasticsearch document count returned by _stats versus _count

I'm trying to get statistics/counts on indices in my elasticsearch cluster (1.2.1). I was using the Indices Stats API (_stats endpoint) to get the total number of primary documents and their size on disk. However, I started experimenting with the Count API (_count endpoint) and noticed that the values do not align.

What is the difference between these values? It's not entirely clear from the documentation though a clue in the documentation indicates that the value returned from Indicies Stats can change when refreshing the index. This makes me wonder if this is a lower-level value from the Lucene layer.

Indices Stats API

localhost:9200/my_index/_stats

...snip...

"_all" : {
  "primaries" : {
    "docs" : {
      "count" : 8284,
      "deleted" : 87
    },
  }
}

...snip...

Count API

localhost:9200/my_index/_count

{
  "count" : 6854,
  "_shards" : {
    "total" : 40,
    "successful" : 40,
    "failed" : 0
  }
}

Upvotes: 10

Views: 4716

Answers (1)

Val
Val

Reputation: 217454

Actually, the docs.count you get back from the Indices stats API also includes the count of nested documents present in the index so it will always be greater or equals than the count you get back from the Count API, which only returns the count of top-level documents, i.e. documents that would be returned from a search query.

So, judging by the numbers you posted, it looks like your index contains documents with fields whose type is nested in the mapping. Sounds correct?

Upvotes: 28

Related Questions