floatingpurr
floatingpurr

Reputation: 8559

MongoDB Compass shows bad minimum value of data distribution of a key

I'm on MongoDB Compass Version 1.5.1 for mac.

When I look at distribution of values, Compass returns plots like the following:

values distribution

As you can see, min and max value are available. But min values are wrong. I know the minimum values of those two keys are 1 and 1, not 9 and 13.

Does Anyone know how to fix that problem?

Upvotes: 0

Views: 1299

Answers (1)

floatingpurr
floatingpurr

Reputation: 8559

Got it. The standard report is based on a sample of max 1000 documents.

From the doc:

Sampling in MongoDB Compass is the practice of selecting a subset of data from the desired collection and analyzing the documents within the sample set.

Sampling is commonly used in statistical analysis because analyzing a subset of data gives similar results to analyzing all of the data. In addition, sampling allows results to be generated quickly rather than performing a potentially long and computationally expensive collection scan.

MongoDB Compass employs two distinct sampling mechanisms.

Collections in MongoDB 3.2 are sampled via the $sample operator in the aggregation framework of the core server. This provides efficient random sampling without replacement over the entire collection, or over the subset of documents specified by a query.

Collections in MongoDB 3.0 and 2.6 are sampled via a backwards compatible algorithm executed entirely within Compass. It comprises three phases:

  1. Query for a stream of _id values, limit 10000 descending by _id
  2. Read the stream of _ids and save sampleSize randomly chosen values. We employ reservoir sampling to perform this efficiently.
  3. Then query the selected random documents by _id The choice of sampling > method is transparent in usage to the end-user.

sampleSize is currently set to 1000 documents.

Upvotes: 1

Related Questions