In NoSQL, how do you handle massive updates to common dependant data?

Question

I really want to understand the NoSQL approach, but some aspects baffle me. And the most readily prominent docs don't seem to address them (that I've found, so far).

For example, I'm looking at the CouchDB website...

Self-Contained Data

An invoice contains all the pertinent information about a single transaction the seller, the buyer, the date, and a list of the items or services sold. [...] Self-contained documents, there’s no abstract reference on this piece of paper that points to some other piece of paper with the seller’s name and address. Accountants appreciate the simplicity of having everything in one place. And given the choice, programmers appreciate that, too.

By "one abstract reference" I think they mean an FK, right? And in an analogous SQL DB the "some other piece of paper" would be a row in a sellers table?

Ok, but what happens when it turns out someone messed up and the seller's address is actually on Maple Avenue, not Maple Lane And you have 96,487 invoices with that say Maple Lane.

What is the orthodox NoSQL way of dealing with that inevitability?

Do you scan your 4.8 million invoice "documents" for the 96k with "Lane", dredge them up, and execute 96k writes?

And if so, in this described CouchDB-based app, WHO goes in and performs that? Because, guessing here, but I imagine your front end probably doesn't have a view with a Seller form. Because your sellers are all embedded inside invoices, right? So in NoSQL, does this sort of data correction & maintenance become the DBA's job?

(Also, do you actually repeat all of the seller's info on every single invoice involving that seller? Doesn't that get expensive?)

And in a huge, busy system, how do you ensure that all that repeated seller data is correct and consistent?

I'm considering which storage technology to look at for a series of upcoming projects. NoSQL is obviously extremely popular and widely adopted. In some domains it's kind of the "Golden Path"/default choice. If I want to use PostgreSQL with Node.js I'll have to scrounge for info about less popular libraries and support.

So there's significant real-world pressure towards MongoDB, CouchDB, etc.

Yet in the systems I'm designing, the questions I mention above are going to really matter. Is there a proven, established, and practical way of addressing these concerns?

In NoSQL, how do you handle massive updates to common dependant data?

Answers (1)

Related Questions