Nafine
Nafine

Reputation: 81

In NoSQL, how do you handle massive updates to common dependant data?

I really want to understand the NoSQL approach, but some aspects baffle me. And the most readily prominent docs don't seem to address them (that I've found, so far).

For example, I'm looking at the CouchDB website...

Self-Contained Data

An invoice contains all the pertinent information about a single transaction the seller, the buyer, the date, and a list of the items or services sold. [...] Self-contained documents, there’s no abstract reference on this piece of paper that points to some other piece of paper with the seller’s name and address. Accountants appreciate the simplicity of having everything in one place. And given the choice, programmers appreciate that, too.

By "one abstract reference" I think they mean an FK, right? And in an analogous SQL DB the "some other piece of paper" would be a row in a sellers table?

Ok, but what happens when it turns out someone messed up and the seller's address is actually on Maple Avenue, not Maple Lane And you have 96,487 invoices with that say Maple Lane.

What is the orthodox NoSQL way of dealing with that inevitability?

Do you scan your 4.8 million invoice "documents" for the 96k with "Lane", dredge them up, and execute 96k writes?

And if so, in this described CouchDB-based app, WHO goes in and performs that? Because, guessing here, but I imagine your front end probably doesn't have a view with a Seller form. Because your sellers are all embedded inside invoices, right? So in NoSQL, does this sort of data correction & maintenance become the DBA's job?

(Also, do you actually repeat all of the seller's info on every single invoice involving that seller? Doesn't that get expensive?)

And in a huge, busy system, how do you ensure that all that repeated seller data is correct and consistent?

I'm considering which storage technology to look at for a series of upcoming projects. NoSQL is obviously extremely popular and widely adopted. In some domains it's kind of the "Golden Path"/default choice. If I want to use PostgreSQL with Node.js I'll have to scrounge for info about less popular libraries and support.

So there's significant real-world pressure towards MongoDB, CouchDB, etc.

Yet in the systems I'm designing, the questions I mention above are going to really matter. Is there a proven, established, and practical way of addressing these concerns?

Upvotes: 4

Views: 2243

Answers (1)

Jonathan Hall
Jonathan Hall

Reputation: 79704

What is the orthodox NoSQL way of dealing with that inevitability?

Two possible approaches:

  1. Essentially the same as the pre-SQL (i.e. paper filing cabinets) way:

    1. Update the master file for the customer.
    2. Use the new address on all new invoices.

    Historical invoices will continue to have wrong data. But that's okay, and arguably even better than the RDBMS way, because it accurately reflects history.

  2. Go to the extra work of updating all the affected documents. With properly built indexes or views, this isn't that hard (you won't have to scan all 4.8 million invoices--your view will direct you to the 18 actually affected by the change)

    I imagine your front end probably doesn't have a view with a Seller form.

    Why not? If you do seller-based queries, I sure hope you have a seller-based view (or several).

    Because your sellers are all embedded inside invoices, right?

    That's irrelevant. Views can index any part of the data.

do you actually repeat all of the the seller's info on every single invoice involving that seller?

Of course. You would repeat it every time you print an invoice on paper, right? Your database document is a "document", same as a printed invoice is.

Doesn't that get expensive?

If you're storing your entire database on a mobile phone, maybe. Otherwise, hard drives are cheap these days.

Yet in the systems I'm designing, the questions I mention above are going to really matter.

NoSQL isn't right for every job. If transactional integrity is important (and it likely is for a financial app like the one you seem to be discussing), it likely is not the right tool.

Think of CouchDB as a sync protocol with a database tacked on for good luck.

If your core feature is the ability to sync, then CouchDB is probably a good fit. If that's not a feature core to your application, then it's probably the wrong tool for the job.

Upvotes: 3

Related Questions