Combine CouchDB databases with replication while recording source db

Question

I’m just starting out with CouchDB (2.1), and I’m planning to use it to replicate confidential per-user data from a mobile app up to my server. I’ve read that per-user databases are the best way to do this, and I’ve set that up. Each database has a mix of user-created documents of types Foo and Bar.

Now, I’d also like to be able to collect multi-user slices of that data together into one database and build views on it for admin reporting. Say I want a database which contains all the Foos from all users. So far so good, an entry in _replicator with a filter from each user database to one target does the job.

But looking at the combined database, I can’t tell which user a given Foo came from. I could write the user id into each document within the per-user database but that seems redundant and adds the complexity of validation. Is there any other way?

natevw · Accepted Answer

CouchDB's replicator simply tries to match up the exact state of a given document in the target database — and if it can't, it stores ± the exact source contents anyway (as a conflicting version).

Furthermore the _rev field of a document, which the replication system uses to check if a document needs to be updated, is actually based on (a hash over) the other document fields.

So unfortunately you can't add metadata during replication. This would indeed be handy for this and other per-user vs. shared replication situations, but it's not something CouchDB currently supports, and it would break some optimizations to add support for it.

I could write the user id into each document within the per-user database but that seems redundant and adds the complexity of validation. Is there any other way?

Including something like a .user field in each document is the right solution.

As far as being redundant, I wouldn't think of it that way — or at least, not as a bad thing. You'll find with CouchDB (and like other NoSQL stores) there's a trend to "denormalize" data to begin with. Especially given the things replication lets me do operationally and architecturally, I'd much rather have a self-contained document than one that relies on metadata derived from a database name.

I'm not sure exactly how in your case an extra field will make validation more complex, so I can't fully speak to that. You do want to make sure the user writing the document has set it "honestly", and so yes there is a bit more complication, but usually not too burdensome in most cases.

Combine CouchDB databases with replication while recording source db

Answers (1)

Related Questions