Jesse
Jesse

Reputation: 10466

Replicating CouchDB to local couch reduces size - why?

I recently started using Couch for a large app I'm working on.

I database with 7907 documents, and wanted to rename the database. I poked around for a bit, but couldn't figure out how to rename it, so I figured I would just replicate it to a local database of the name I wanted.

The first time I tried, the replication failed, I believe the error was a timeout. I tried again, and it worked very quickly, which was a little disconcerting.

After the replication, I'm showing that the new database has the correct amount of records, but the database size is about 1/3 of the original.

Also a little odd is that if I refresh futon, the size of the original fluctuates between 94.6 and 95.5 mb

This leaves me with a few questions:

  1. Is the 2nd database storing references to the first? If so, can I delete the first without causing harm?

  2. Why would the size be so different? Had the original built indexes that the new one eventually will?

  3. Why is the size fluctuating?

edit:

A few things that might be helpful:

Upvotes: 2

Views: 803

Answers (1)

JasonSmith
JasonSmith

Reputation: 73752

Replicating to a new database is similar to compaction. Both involve certain side-effects (incidentally, and intentionally, respectively) which reduce the size of the new .couch file.

  • The b-tree indexes get balanced
  • Data from old document revisions is discarded.
  • Metadata from previous updates to the DB is discarded.

Replications store to/from checkpoints, so if you re-replicate from the same source, to the same location (i.e. re-run a replication that timed out), it will pick up where it left off.

Answers:

  1. Replication does not create a reference to another database. You can delete the first without causing harm.
  2. Replicating (and compacting) generally reduces disk usage. If you have any views in any design documents, those will re-build when you first query them. View indexes use their own .view file which also consumes space.
  3. I am not sure why the size is fluctuating. Browser and proxy caches are the bane of CouchDB (and web) development. But perhaps it is also a result of internal Cloudant behavior (for example, different nodes in the cluster reporting slightly different sizes).

Upvotes: 7

Related Questions