dsp_099
dsp_099

Reputation: 6121

Denormalizing and querying a giant Couchbase data set

Let's say I have about 1Tb of SQL email data that has the following fields:

email, email_host, username, first_name, last_name, last_login

I'm told that the MySQL queries like SELECT * FROM emails WHERE email_host = 'gmail.com' are running a bit slow... we're talking 30 minutes or more.

I'd like to try out Couchbase but I'm not sure how to go about denormalizing the data and building the views. It seems that one possible approach as per Couchbase's own video tutorial sessions is to create a "key," like:

u::gmail::incremented_id

or

u::john::incremented_id

then only fetch all the u::gmails to have a list of keys corresponding to gmail addresses, instead of querying all of the documents and checking if the email_host attribute is gmail.

Is this a legitimate / good strategy to go about denormalizing this sample data set and if so, what is the proper technique to create a view to preindex u::whatever?

Upvotes: 0

Views: 113

Answers (2)

noun
noun

Reputation: 3715

You just need a map function like this:

function(doc) {
  if (doc.type == 'user')
    emit(doc.email_host, doc);
}

Then you query the view using as key gmail.com to get all the users using gmail. If the database contains other document types, you better include a type attribute in your documents to distinguish different types.

Upvotes: 1

a4xrbj1
a4xrbj1

Reputation: 425

I would just add email_host to your document and give it the value behind the '@'. That way all documents have the email host as a field and can be easily queried. Should be high performing.

Upvotes: 0

Related Questions